Lecture notes for Math 115A (linear algebra) 
Fall of 2002 
Terence Tao, UCLA 
http://www.math.ucla.edu/~tao/resource/general/115a.3.02f/ 
The textbook used was Linear Algebra, $.H. Friedberg, A.J. Insel, L.E. 
Spence, Third Edition. Prentice Hall, 1999. 
Thanks to Radhakrishna Bettadapura, Yu Cao, Cristian Gonzales, Hannah 
Kim, Michael Smith, Wilson Sov, Luging Ye, and Shijia Yu for corrections. 


Math 115A - Week 1 
Textbook sections: 1.1-1.6 
Topics covered: 


e What is Linear algebra? 
e Overview of course 
e What is a vector? What is a vector space? 
e Examples of vector spaces 
e Vector subspaces 
e Span, linear dependence, linear independence 
e Systems of linear equations 
e Bases 
* OK OK OK OK 


Overview of course 


e This course is an introduction to Linear algebra. Linear algebra is the 
study of linear transformations and their algebraic properties. 


e A transformation is any operation that transforms an input to an out- 
put. A transformation is linear if (a) every amplification of the input 
causes a corresponding amplification of the output (e.g. doubling of the 
input causes a doubling of the output), and (b) adding inputs together 
leads to adding of their respective outputs. [We’ll be more precise 
about this much later in the course.] 


e A simple example of a linear transformation is the map y := 32, where 
the input x is a real number, and the output y is also a real number. 
Thus, for instance, in this example an input of 5 units causes an output 
of 15 units. Note that a doubling of the input causes a doubling of the 
output, and if one adds two inputs together (e.g. add a 3-unit input 
with a 5-unit input to form a 8-unit input) then the respective outputs 


2 


(9-unit and 15-unit outputs, in this example) also add together (to form 
a 24-unit output). Note also that the graph of this linear transformation 
is a straight line (which is where the term linear comes from). 


(Footnote: I use the symbol := to mean “is defined as”, as opposed to 
the symbol =, which means “is equal to”. (It’s similar to the distinction 
between the symbols = and == in computer languages such as C'+ +, 
or the distinction between causation and correlation). In many texts 
one does not make this distinction, and uses the symbol = to denote 
both. In practice, the distinction is too fine to be really important, so 
you can ignore the colons and read := as = if you want.) 


An example of a non-linear transformation is the map y := x”; note 
now that doubling the input leads to quadrupling the output. Also if 
one adds two inputs together, their outputs do not add (e.g. a 3-unit 
input has a 9-unit output, and a 5-unit input has a 25-unit output, but 
a combined 3 + 5-unit input does not have a 9 + 25 = 34-unit output, 
but rather a 64-unit output!). Note the graph of this transformation is 
very much non-linear. 


In real life, most transformations are non-linear; however, they can of- 
ten be approximated accurately by a linear transformation. (Indeed, 
this is the whole point of differential calculus - one takes a non-linear 
function and approximates it by a tangent line, which is a linear func- 
tion). This is advantageous because linear transformations are much 
easier to study than non-linear transformations. 


In the examples given above, both the input and output were scalar 
quantities - they were described by a single number. However in many 
situations, the input or the output (or both) is not described by a 
single number, but rather by several numbers; in which case the input 
(or output) is not a scalar, but instead a vector. [This is a slight 
oversimplification - more exotic examples of input and output are also 
possible when the transformation is non-linear. | 


A simple example of a vector-valued linear transformation is given by 
Newton’s second law 


F = ma, or equivalently a = F'/m. 


3 


One can view this law as a statement that a force F' applied to an 
object of mass m causes an acceleration a, equal to a := F'/m; thus 
F can be viewed as an input and a as an output. Both F' and a are 
vectors; if for instance F' is equal to 15 Newtons in the East direction 
plus 6 Newtons in the North direction (i.e. F’ := (15,6)N), and the 
object has mass m := 3kg, then the resulting acceleration is the vector 
a = (5,2)m/s? (ie. 5m/s? in the East direction plus 2m/s? in the 
North direction). 


Observe that even though the input and outputs are now vectors in 
this example, this transformation is still linear (as long as the mass 
stays constant); doubling the input force still causes a doubling of the 
output acceleration, and adding two forces together results in adding 
the two respective accelerations together. 


One can write Newton’s second law in co-ordinates. If we are in three 
dimensions, so that F := (F,, Fy, F.) and a := (daz, ay, az), then the law 
can be written as 

F, = ma, + Oa, + 0a, 


Fi, = 0a, + ma, + 0a, 
F’, = Oaz + Oa, + ma,. 


This linear transformation is associated to the matrix 


m 0 O 
0 m O 
0 O m 


Here is another example of a linear transformation with vector inputs 
and vector outputs: 
Y= 321 + 0X9 + 7X3 


Yo = 2%, + 4rq + 623; 


this linear transformation corresponds to the matrix 


Oar 
24 6) 


As it turns out, every linear transformation corresponds to a matrix, 
although if one wants to split hairs the two concepts are not quite the 
same thing. [Linear transformations are to matrices as concepts are to 
words; different languages can encode the same concept using different 
words. We'll discuss linear transformations and matrices much later in 
the course.| 


e Linear algebra is the study of the algebraic properties of linear trans- 
formations (and matrices). Algebra is concerned with how to manip- 
ulate symbolic combinations of objects, and how to equate one such 
combination with another; e.g. how to simplify an expression such as 
(a — 3)(a +5). In linear algebra we shall manipulate not just scalars, 
but also vectors, vector spaces, matrices, and linear transformations. 
These manipulations will include familiar operations such as addition, 
multiplication, and reciprocal (multiplicative inverse), but also new op- 
erations such as span, dimension, transpose, determinant, trace, eigen- 
value, eigenvector, and characteristic polynomial. [Algebra is distinct 
from other branches of mathematics such as combinatorics (which is 
more concerned with counting objects than equating them) or analysis 
(which is more concerned with estimating and approximating objects, 
and obtaining qualitative rather than quantitative properties).| 


KOK OK OK 


Overview of course 


e Linear transformations and matrices are the focus of this course. How- 
ever, before we study them, we first must study the more basic concepts 
of vectors and vector spaces; this is what the first two weeks will cover. 
(You will have had some exposure to vectors in 32AB and 33A, but 
we will need to review this material in more depth - in particular we 
concentrate much more on concepts, theory and proofs than on com- 
putation). One of our main goals here is to understand how a small set 
of vectors (called a basis) can be used to describe all other vectors in 
a vector space (thus giving rise to a co-ordinate system for that vector 
space). 


e In weeks 3-5, we will study linear transformations and their co-ordinate 
representation in terms of matrices. We will study how to multiply two 


transformations (or matrices), as well as the more difficult question of 
how to invert a transformation (or matrix). The material from weeks 
1-5 will then be tested in the midterm for the course. 


After the midterm, we will focus on matrices. A general matrix or linear 
transformation is difficult to visualize directly, however one can under- 
stand them much better if they can be diagonalized. This will force us 
to understand various statistics associated with a matrix, such as deter- 
minant, trace, characteristic polynomial, eigenvalues, and eigenvectors; 
this will occupy weeks 6-8. 


In the last three weeks we will study inner product spaces, which are 
a fancier version of vector spaces. (Vector spaces allow you to add 
and scalar multiply vectors; inner product spaces also allow you to 
compute lengths, angles, and inner products). We then review the 
earlier material on bases using inner products, and begin the study 
of how linear transformations behave on inner product spaces. (This 
study will be continued in 115B). 


Much of the early material may seem familiar to you from previous 
courses, but I definitely recommend that you still review it carefully, as 
this will make the more difficult later material much easier to handle. 


KOK OK OK Ok 


What is a vector? What is a vector space? 


We now review what a vector is, and what a vector space is. First let 
us recall what a scalar is. 


Informally, a scalar is any quantity which can be described by a sin- 
gle number. An example is mass: an object has a mass of m kg for 
some real number m. Other examples of scalar quantities from physics 
include charge, density, speed, length, time, energy, temperature, vol- 
ume, and pressure. In finance, scalars would include money, interest 
rates, prices, and volume. (You can think up examples of scalars in 
chemistry, EE, mathematical biology, or many other fields). 


The set of all scalars is referred to as the field of scalars; it is usually 
just R, the field of real numbers, but occasionally one likes to work 


with other fields such as C, the field of complex numbers, or Q, the 
field of rational numbers. However in this course the field of scalars will 
almost always be R. (In the textbook the scalar field is often denoted 
F, just to keep aside the possibility that it might not be the reals R; 
but I will not bother trying to make this distinction.) 


Any two scalars can be added, subtracted, or multiplied together to 
form another scalar. Scalars obey various rules of algebra, for instance 
x+y is always equal to y+ x, and x * (y+ z) is equal tox* y+ 2 *z. 


Now we turn to vectors and vector spaces. Informally, a vector is any 
member of a vector space; a vector space is any class of objects which 
can be added together, or multiplied with scalars. (A more popular, 
but less mathematically accurate, definition of a vector is any quantity 
with both direction and magnitude. This is true for some common 
kinds of vectors - most notably physical vectors - but is misleading or 
false for other kinds). As with scalars, vectors must obey certain rules 
of algebra. 


Before we give the formal definition, let us first recall some familiar 
examples. 


The vector space R? is the space of all vectors of the form (x, y), where 
x and y are real numbers. (In other words, R? := {(z,y) : z,y € R}). 
For instance, (—4, 3.5) is a vector in R?. One can add two vectors in R? 
by adding their components separately, thus for instance (1, 2)+(3, 4) = 
(4,6). One can multiply a vector in R? by a scalar by multiplying each 
component separately, thus for instance 3 * (1,2) = (3,6). Among all 
the vectors in R? is the zero vector (0,0). Vectors in R? are used for 
many physical quantities in two dimensions; they can be represented 
graphically by arrows in a plane, with addition represented by the 
parallelogram law and scalar multiplication by dilation. 


The vector space R® is the space of all vectors of the form (2, y, 2), 
where x, y, z are real numbers: R? := {(z,y,z): 2, y,2z € R}. Addition 
and scalar multiplication proceeds similar to R?: (1,2,3) + (4,5,6) = 
(5,7,9), and 4 « (1,2,3) = (4,8,12). However, addition of a vector in 
R? to a vector in R? is undefined; (1,2) + (3, 4,5) doesn’t make sense. 


Among all the vectors in R? is the zero vector (0, 0,0). Vectors in R® are 
used for many physical quantities in three dimensions, such as velocity, 
momentum, current, electric and magnetic fields, force, acceleration, 
and displacement; they can be represented by arrows in space. 


e One can similarly define the vector spaces R*, R°, etc. Vectors in 
these spaces are not often used to represent physical quantities, and 
are more difficult to represent graphically, but are useful for describing 
populations in biology, portfolios in finance, or many other types of 
quantities which need several numbers to describe them completely. 


OK OK OK 


Definition of a vector space 


e Definition. A vector space is any collection V of objects (called vec- 
tors) for which two operations can be performed: 


e Vector addition, which takes two vectors v and w in V and returns 
another vector v-+w in V. (Thus V must be closed under addition). 


e Scalar multiplication, which takes a scalar c in R and a vector v in V, 
and returns another vector cv in V. (Thus V must be closed under 
scalar multiplication). 


e Furthermore, for V to be a vector space, the following properties must 
be satisfied: 


e (I. Addition is commutative) For allv,weEV,v+tw=w+u. 
e (II. Addition is associative) For all u,v,w € V, ut(v+tw) = (u+v)+w. 


e (III. Additive identity) There is a vector 0 € V, called the zero vector, 
such that 0+ v =v for allu EV. 


e (IV. Additive inverse) For each vector v € V, there is a vector —v € V, 
called the additive inverse of v, such that —v + v = 0. 


e (V. Multiplicative identity) The scalar 1 has the property that lv = v 
for allu EV. 


e (VI. Multiplication is associative) For any scalars a,b € R and any 
vector uv € V, we have a(buv) = (ab)uv. 


e (VII. Multiplication is linear) For any scalar a € R and any vectors 
v,w € V, we have a(v + w) = av + aw. 


e (VIII. Multiplication distributes over addition) For any scalars a,b € R 
and any vector uv € V, we have (a+ b)v = av + bv. 


KOK OK OK Ok 


(Not very important) remarks 


e The number of properties listed is long, but they can be summarized 
briefly as: the laws of algebra work! They are all eminently reasonable; 
one would not want to work with vectors for which v + w 4 w+ 1, for 
instance. Verifying all the vector space axioms seems rather tedious, 
but later we will see that in most cases we don’t need to verify all of 
them. 


e Because addition is associative (axiom II), we will often write expres- 
sions such as u+u+w without worrying about which order the vectors 
are added in. Similarly from axiom VI we can write things like abv. 
We also write v — w as shorthand for v + (—w). 


e A philosophical point: we never say exactly what vectors are, only 
what vectors do. This is an example of abstraction, which appears 
everywhere in mathematics (but especially in algebra): the exact sub- 
stance of an object is not important, only its properties and functions. 
(For instance, when using the number “three” in mathematics, it is 
unimportant whether we refer to three rocks, three sheep, or whatever; 
what is important is how to add, multiply, and otherwise manipulate 
these numbers, and what properties these operations have). This is 
tremendously powerful: it means that we can use a single theory (lin- 
ear algebra) to deal with many very different subjects (physical vectors, 
population vectors in biology, portfolio vectors in finance, probability 
distributions in probability, functions in analysis, etc.). [A similar phi- 
losophy underlies “object-oriented programming” in computer science. | 
Of course, even though vector spaces can be abstract, it is often very 


helpful to keep concrete examples of vector spaces such as R? and R? 
handy, as they are of course much easier to visualize. For instance, 
even when dealing with an abstract vector space we shall often still 
just draw arrows in R? or R3, mainly because our blackboards don’t 
have all that many dimensions. 


e Because we chose our field of scalars to be the field of real numbers R, 
these vector fields are known as real vector fields, or vector fields over 
R. Occasionally people use other fields, such as complex numbers C, to 
define the scalars, thus creating complex vector fields (or vector fields 
over C), etc. Another interesting choice is to use functions instead of 
numbers as scalars (for instance, one could have an indeterminate z, 
and let things like 473 + 2x + 5 be scalars, and (42° + 27? + 5, x4 — 4) 
be vectors). We will stick almost exclusively with the real scalar field 
in this course, but because of the abstract nature of this theory, almost 
everything we say in this course works equally well for other scalar 
fields. 


e A pedantic point: The zero vector is often denoted 0, but technically 
it is not the same as the zero scalar 0. But in practice there is no harm 
in confusing the two objects: zero of one thing is pretty much the same 
as zero of any other thing. 


OK OK OK 


Examples of vector spaces 


e n-tuples as vectors. For any integer n > 1, the vector space R” is 
defined to be the space of all n-tuples of reals (21, 22,...,2n). These 
are ordered n-tuples, so for instance (3, 4) is not the same as (4,3); two 
vectors are equal (11, %2,.--,2n) and (41, y2,---,Yn) are only equal if 
%1 =, T2 = yo, ..., and ®, = yn. Addition of vectors is defined by 


(21,02, ..-,¥n) + (Yt, Yas ++) Ym) = (1 + Yr, La + Ya, +++ 5 Xn + Yn) 
and scalar multiplication by 
(el on seme 9 vem 06" ee x 6 
The zero vector is 


0230, 0 ce 80) 


10 


and additive inverse is given by 
—(1,X2,...,Xn) = (—21, —Lo,.-., —Ln). 


A typical use of such a vector is to count several types of objects. For 
instance, a simple ecosystem consisting of X units of plankton, Y units 
of fish, and Z whales might be represented by the vector (X,Y, Z). 
Combining two ecosystems together would then correspond to adding 
the two vectors; natural population growth might correspond to mul- 
tiplying the vector by some scalar corresponding to the growth rate. 
(More complicated operations, dealing with how one species impacts 
another, would probably be dealt with via matrix operations, which 
we will come to later). As one can see, there is no reason for n to be 
restricted to two or three dimensions. 


The vector space axioms can be verified for R”, but it is tedious to do 
so. We shall just verify one axiom here, axiom VIII: (a+b)v = av+bv. 
We can write the vector v in the form v := (21, %2,...,%n). The left- 
hand side is then 


(a+ b)u = (a+))(x1, 22,.--,%n) = ((a+b)x1, (a+ b)ro,..., (a+b) rp) 


while the right-hand side is 


av + bv = a(X1, £2,.--,%n) + b(41, L2,.--, Ln) 
= (AX), AX2,...,ALn) + (021, bx2,..., Ln) 
= (ax, + 621, ar2 + b%2,...,AXy + OF) 


and the two sides match since (a + b)a; = ax; + bx; for each j = 
eee tp 


There are of course other things we can do with R”, such as taking dot 
products, lengths, angles, etc., but those operations are not common 
to all vector spaces and so we do not discuss them here. 


Scalars as vectors. The scalar field R can itself be thought of as a 
vector space - after all, it has addition and scalar multiplication. It 
is essentially the same space as R!. However, this is a rather boring 


Lh 


vector space and it is often confusing (though technically correct) to 
refer to scalars as a type of vector. Just as R? represents vectors in 
a plane and R? represents vectors in space, R! represents vectors in a 
line. 


The zero vector space. Actually, there is an even more boring vector 
space than R - the zero vector space R® (also called {0}), consisting 
solely of a single vector 0, the zero vector, which is also sometimes 
denoted () in this context. Addition and multiplication are trivial: 
0+0=0 and c0 = 0. The space R® represents vectors in a point. 
Although this space is utterly uninteresting, it is necessary to include 
it in the pantheon of vector spaces, just as the number zero is required 
to complete the set of integers. 


Complex numbers as vectors. The space C of complex numbers 
can be viewed as a vector space over the reals; one can certainly add two 
complex numbers together, or multiply a complex number by a (real) 
scalar, with all the laws of arithmetic holding. Thus, for instance, 3+ 27 
would be a vector, and an example of scalar multiplication would be 
5(3+2i) = 15+10i. This space is very similar to R’, although complex 
numbers enjoy certain operations, such as complex multiplication and 
complex conjugate, which are not available to vectors in R?. 


Polynomials as vectors I. For any n > 0, let P,(R) denote the 
vector space of all polynomials of one indeterminate variable « whose 
degree is at most n. Thus for instance P3(R) contains the “vectors” 


ge +e? +4, g?—4; -1.523+2.5¢+7; 0 
but not 


gett: 4/e; -sinfa)+e?. 2? +a*. 
Addition, scalar multiplication, and additive inverse are defined in the 
standard manner, thus for instance 


(x? + Qn? + 4) + (—2? +2? +4) = 327 +8 (0.1) 


and 
3(x° + Qa? + 4) = 3x? + 6a? + 12. 


The zero vector is just 0. 


12 


e Notice in this example it does not really matter what x is. The space 
P,,(R) is very similar to the vector space R”*!; indeed one can match 
one to the other by the pairing 


n n—-1 
Ant” + An 10” +... +040 +09 => (An, Gn-1,---, 41, 40), 


thus for instance in P3(R), the polynomial x? + 2x7? + 4 would be as- 
sociated with the 4-tuple (1,2,0,4). The more precise statement here 
is that P,(R) and R"*! are isomorphic vector spaces; more on this 
later. However, the two spaces are still different; for instance we can 
do certain operations in P,,(R), such as differentiate with respect to z, 
which do not make much sense for R”*?. 


e Notice that we allow the polynomials to have degree less than n; if we 
only allowed polynomials of degree exactly n, then we would not have 
a vector space because the sum of two vectors would not necessarily be 
a vector (see (0.1)). (In other words, such a space would not be closed 
under addition). 


e Polynomials as vectors II. Let P(R) denote the vector space of all 
polynomials of one indeterminate variable x - regardless of degree. (In 
other words, P(R) := U7, P,(R), the union of all the P,(R)). Thus 
this space in particular contains the monomials 


Lea es es 
though of course it contains many other vectors as well. 


e This space is much larger than any of the P,(R), and is not isomor- 
phic to any of the standard vector spaces R”. Indeed, it is an infinite 
dimensional space - there are infinitely many “independent” vectors in 
this space. (More on this later). 


e Functions as vectors I. Why stick to polynomials? Let C(R) denote 
the vector space of all continuous functions of one real variable x - thus 
this space includes as vectors such objects as 


a++a+1; sin(r)+e?; 2° +n-sin(z); |z\. 


13 


One still has addition and scalar multiplication: 
(sin(x) + e”) + (2? + 7 —sin(z)) = a2? +e" +2 


5(sin(a) + e”) = 5sin(x) + 5e”, 


and all the laws of vector spaces still hold. This space is substantially 
larger than P(R), and is another example of an infinite dimensional 
vector space. 


Functions as vectors II. In the previous example the real variable 
x could range over all the real line R. However, we could instead 
restrict the real variable to some smaller set, such as the interval [0, 1], 
and just consider the vector space C'({0,1]) of continuous functions on 
[(0, 1]. This would include such vectors such as 


geet: sin(z) +e 22 +7 —sin(z); (zl. 


This looks very similar to C(R), but this space is a bit smaller because 
more functions are equal. For instance, the functions x and |z| are 
the same vector in C((0,1]), even though they are different vectors in 
C(R). 


Functions as vectors III. Why stick to continuous functions? Let 
F(R,R) denote the space of all functions of one real variable R, re- 
gardless of whether they are continuous or not. In addition to all the 
vectors in C(R) the space F(R, R) contains many strange objects, such 


as the function 
_jf1 ifzeQ 

He)={ 4 ifr ¢gQ 
This space is much, much, larger than C(R); it is also infinite di- 
mensional, but it is in some sense “more infinite” than C(R). (More 
precisely, the dimension of C(R) is countably infinite, but the dimen- 
sion of F(R, R) is uncountably infinite. Further discussion is beyond 
the scope of this course, but see Math 112). 


Functions as vectors IV. Just as the vector space C(R) of continuous 
functions can be restricted to smaller sets, the space F(R, R) can also 
be restricted. For any subset S of the real line, let F(S,R) denote 


14 


the vector space of all functions from S to R, thus a vector in this 
space is a function f which assigns a real number f(x) to each z in S. 
Two vectors f, g would be considered equal if f(a) = g(x) for each x 
in S. For instance, if S is the two element set S := {0,1}, then the 
two functions f(x) := x? and g(x) := x would be considered the same 
vector in F({0,1},R), because they equal the same value at 0 and 1. 
Indeed, to specify any vector f in {0,1}, one just needs to specify f(0) 
and f(1). As such, this space is very similar to R?. 


Sequences as vectors. An infinite sequence is a sequence of real 
numbers 

(a4, 42,43, 44,.- Ni 
for instance, a typical sequence is 


(GA 6 8 1019.2:2): 


Let R® denote the vector space of all infinite sequences. These se- 
quences are added together by the rule 


(a1, da, nes 2 + (b1, bo, ms 2) = (ay + by, ag + bo, ae S 
and scalar multiplied by the rule 
Clay, igs ss) $= "( Cay Cag, <2) 


This vector space is very much like the finite-dimensional vector spaces 
R’, R®, ..., except that these sequences do not terminate. 


Matrices as vectors. Given any integers m,n > 1, we let Minxn(R) 
be the space of all m x n matrices (i.e. m rows and n columns) with 
real entries, thus for instance M2,.3 contains such “vectors” as 


Pe Docg O -1 -2 

Ae A Jt —3 -4 -5 }° 
Two matrices are equal if and only if all of their individual components 
match up; rearranging the entries of a matrix will produce a different 


15 


matrix. Matrix addition and scalar multiplication is defined similarly 
to vectors: 


1 2 3 i 0 Sai, EPR ip ke Sk Ie 
oe -3 -4 -5/ \111 
Bde os of TO 20k 230 
456) \ 40 50 60 /° 
Matrices are useful for many things, notably for solving linear equations 


and for encoding linear transformations; more on these later in the 
course. 


As you can see, there are (infinitely!) many examples of vector spaces, 
some of which look very different from the familiar examples of R? and 
R®. Nevertheless, much of the theory we do here will cover all of these 
examples simultaneously. When we depict these vector spaces on the 
blackboard, we will draw them as if they were R? or R?, but they are 
often much larger, and each point we draw in the vector space, which 
represents a vector, could in reality stand for a very complicated object 
such as a polynomial, matrix, or function. So some of the pictures we 
draw should be interpreted more as analogies or metaphors than as a 
literal depiction of the situation. 


OK OK OK 


Non-vector spaces 


Now for some examples of things which are not vector spaces. 


Latitude and longitude. The location of any point on the earth can 
be described by two numbers, e.g. Los Angeles is 34 N, 118 W. This 
may look a lot like a two-dimensional vector in R?, but the space of 
all latitude-longitude pairs is not a vector space, because there is no 
reasonable way of adding or scalar multiplying such pairs. For instance, 
how could you multiply Los Angeles by 10? 340 N, 1180 W does not 
make sense. 


Unit vectors. In R®, a unit vector is any vector with unit length, for 
instance (0,0, 1), (0,—-1,0), and (2,0, 2) are all unit vectors. However 


16 


the space of all unit vectors (sometimes denoted S”, for two-dimensional 
sphere) is not a vector space as it is not closed under addition (or under 
scalar multiplication). 


e The positive real axis. The space R* of positive real numbers is 
closed under addition, and obeys most of the rules of vector spaces, but 
is not a vector space, because one cannot multiply by negative scalars. 
(Also, it does not contain a zero vector). 


e Monomials. The space of monomials 1, 7, x?,x°,... does not form a 
vector space - it is not closed under addition or scalar multiplication. 


KOK OK OK 


Vector arithmetic 


e The vector space axioms I-VIII can be used to deduce all the other 
familiar laws of vector arithmetic. For instance, we have 


e Vector cancellation law If u,v, w are vectors such that u+v = ut+w, 
then v = w. 


e Proof: Since wu is a vector, we have an additive inverse —u such that 
—u+u = 0, by axiom IV. Now we add —u to both sides of u+u = ut+w: 


—u+(u+v) =—ut+(utw). 
Now use axiom II: 
(—u+u)+y =(-ut+tu)+w 


then axiom IV: 


then axiom III: 


e As you can see, these algebraic manipulations are rather trivial. After 
the first week we usually won’t do these computations in such painful 
detail. 


17 


e Some other simple algebraic facts, which you can amuse yourself with 
by deriving them from the axioms: 


0v=0; (-l)vu=—-v; —(v+w) =(-v)+(-w); a0=0; a(—zr) = 


KOK OK OK Ok 


Vector subspaces 


e Many vector spaces are subspaces of another. A vector space W is a 
subspace of a vector space V if W C V (i.e. every vector in W is also a 
vector in V), and the laws of vector addition and scalar multiplication 
are consistent (i.e. if v; and v2 are in W, and hence in V, the rule that 
W gives for adding v, and v2 gives the same answer as the rule that V 
gives for adding v, and vy.) 


e For instance, the space P2(R) - the vector space of polynomials of 
degree at most 2 is a subspace of P3(R). Both are subspaces of P(R), 
the vector space of polynomials of arbitrary degree. C'(|0, 1]), the space 
of continuous functions on [0,1], is a subspace of F([0,1],R). And 
so forth. (Technically, R? is not a subspace of R®, because a two- 
dimensional vector is not a three-dimensional vector. However, R? 
does contain subspaces which are almost identical to R?. More on this 
later). 


e If V is a vector space, and W is a subset of V (ic. W C V), then 
of course we can add and scalar multiply vectors in W, since they are 
automatically vectors in V. On the other hand, W is not necessarily 
a subspace, because it may not be a vector space. (For instance, the 
set S* of unit vectors in R® is a subset of R’, but is not a subspace). 
However, it is easy to check when a subset is a subspace: 


e Lemma. Let V be a vector space, and let W be a subset of V. Then 
W is a subspace of V if and only if the following two properties are 
satisfied: 


e (W is closed under addition) If w; and we are in W, then w; + wy is 
also in W. 


18 


(—a)x = —ax 


(W is closed under scalar multiplication) If w isin W and cis a scalar, 
then cw is also in W. 


Proof. First suppose that W is a subspace of V. Then W will be 
closed under addition and multiplication directly from the definition of 
vector space. This proves the “only if” part. 


Now we prove the harder “if part”. In other words, we assume that W is 
a subset of V which is closed under addition and scalar multiplication, 
and we have to prove that W is a vector space. In other words, we 
have to verify the axioms I-VIII. 


Most of these axioms follow immediately because W is a subset of V, 
and V already obeys the axioms I-VIII. For instance, since vectors v1, v2 
in V obey the commutativity property vj +v2 = vo+v1, it automatically 
follows that vectors in W also obey the property w; + wg = wo + W1, 
since all vectors in W are also vectors in V. This reasoning easily gives 
us axioms I, II, V, VI, VU, VIII. 


There is a potential problem with HI though, because the zero vector 0 
of V might not lie in W. Similarly with IV, there is a potential problem 
that if w lies in W, then —w might not lie in W. But both problems 
cannot occur, because 0 = Ow and —w = (—1)w (Exercise: prove this 
from the axioms!), and W is closed under scalar multiplication. 


This Lemma makes it quite easy to generate a large number of vector 
spaces, simply by taking a big vector space and passing to a subset 
which is closed under addition and scalar multiplication. Some exam- 
ples: 


(Horizontal vectors) Recall that R? is the vector space of all vectors 
(x,y,z) with x,y,z real. Let V be the subset of R® consisting of all 
vectors with zero z co-ordinate, i.e. V := {(x,y,0):2,y € R}. This is 
a subset of R®, but moreover it is also a subspace of R®. To sce this, 
we use the Lemma. It suffices to show that V is closed under vector 
addition and scalar multiplication. Let’s check the vector addition. If 
we have two vectors in V, say (21,y1,0) and (22, y2,0), we need to 
verify that the sum of these two vectors is still in V. But the sum is 
just (v1 + %2,y1 + y2,0), and this is in V because the z co-ordinate 


19 


is zero. Thus V is closed under vector addition. A similar argument 
shows that V is closed under scalar multiplication, and so V is indeed 
a subspace of R®. (Indeed, V is very similar to - though technically not 
the same thing as - R”). Note that if we considered instead the space 
of all vectors with z co-ordinate 1, ie. {(z,y,1):2,y € R}, then this 
would be a subset but not a subspace, because it is not closed under 
vector addition (or under scalar multiplication, for that matter). 


Another example of a subspace of R? is the plane {(z,y,z) € R? : 
x +2y+3z = 0}. A third example of a subspace of R® is the line 
{(t, 2t, 3t): t © R}. (Exercise: verify that these are indeed subspaces). 
Notice how subspaces tend to be very flat objects which go through the 
origin; this is consistent with them being closed under vector addition 
and scalar multiplication. 


In R?, the only subspaces are lines through the origin, planes through 
the origin, the whole space R?, and the zero vector space {0}. In R?, 
the only subspaces are lines through the origin, the whole space R?, 
and the zero vector space {0}. (This is another clue as to why this 
subject is called linear algebra). 


(Even polynomials) Recall that P(R) is the vector space of all poly- 
nomials f(z). Call a polynomial even if f(x) = f(—2); for instance, 
f(x) = x4 + 2x? + 3 is even, but f(x) = 2? +1 is not. Let Poyen(R) 
denote the set of all even polynomials, thus Peyen(R) is a subset of 
P(R). Now we show that Peren(R) is not just a subset, it is a sub- 
space of P(R). Again, it suffices to show that Peyen(R) is closed under 
vector addition and scalar multiplication. Let’s show it’s closed un- 
der vector addition - i.e. if f and g are even polynomials, we have to 
show that f + g is also even. In other words, we have to show that 
f(—x) + g(—x) = f(x) + 9(x). But this is clear since f(—x) = f(z) 
and g(—x) = g(x). A similar argument shows why even polynomials 
are closed under scalar multiplication. 


(Diagonal matrices) Let n > 1 be an integer. Recall that Mn .n(R) 
is the vector space of n x n real matrices. Call a matrix diagonal if 
all the entries away from the main diagonal (from top left to bottom 


20 


right) are zero, thus for instance 
1 0 0 
Oh De 1 
0 0 3 


is a diagonal matrix. Let D,,(R) denote the space of all diagonal n x n 
matrices. This is a subset of /,,,.,(R), and is also a subspace, because 
the sum of any two diagonal matrices is again a diagonal matrix, and 
the scalar product of a diagonal matrix and a scalar is still a diagonal 
matrix. The notation of a diagonal matrix will become very useful 
much later in the course. 


(Trace zero matrices) Let n > 1 be an integer. If A is ann xn 
matrix, we define the trace of that matrix, denoted tr(A), to be the 
sum of all the entries on the diagonal. For instance, if 


Dy 2.8 
A=|4 5 6 
TO. 9 


then 
tr(A)=14+54+9= 15. 
Let M°,.,(R) denote the set of all n x n matrices whose trace is zero: 
M?°,..(R) := {A © Maxn: tr(A) = 0}. 


One can easily check that this space is a subspace of M,.,. We will 
return to traces much later in this course. 


Technically speaking, every vector space V is considered a subspace of 
itself (since V is already closed under addition and scalar multiplica- 
tion). Also the zero vector space {0} is a subspace of every vector space 
(for a similar reason). But these are rather uninteresting examples of 
subspaces. We sometimes use the term proper subspace of V to denote 
a subspace W of V which is not the whole space V or the zero vector 
space {0}, but instead is something in between. 


21 


e The intersection of two subspaces is again a subspace (why?). For 
instance, since the diagonal matrices D,,(R) and the trace zero matrices 


M°...(R) are both subspaces of Mjyx,(R), their intersection D,,(R) 
M°,.,,(R) is also a subspace of Myyn(R). On the other hand, the 


union of two subspaces is usually not a subspace. For instance, the 
x-axis {(7,0) : « € R} and y-axis {(0,y) : y € R}, but their union 
{(z,0): 2 € R}U{(0,y):y € R} is not (why?). See Assignment 1 for 
more details. 


e In some texts one uses the notation W < V to denote the statement 
“W is a subspace of V”. I’ll avoid this as it may be a little confusing at 
first. However, the notation is suggestive. For instance it is true that 
if U <W and W <V, then U < V; ie. if U is a subspace of W, and 
W is a subspace of V, then U is a subspace of V. (Why?) 


* OK OK OK Ok 


Linear combinations 


e Let’s look at the standard vector space R°, and try to build some 
subspaces of this space. To get started, let’s pick a random vector in 
R?, say v := (1,2,3), and ask how to make a subspace V of R? which 
would contain this vector (1, 2,3). Of course, this is easy to accomplish 
by setting V equal to all of R°; this would certainly contain our single 
vector v, but that is overkill. Let’s try to find a smaller subspace of R? 
which contains v. 


e We could start by trying to make V just consist of the single point 
(1,2,3): V := {(1,2,3)}. But this doesn’t work, because this space 
is not a vector space; it is not closed under scalar multiplication. For 
instance, 10(1,2,3) = (10, 20,30) is not in the space. To make V a 
vector space, we cannot just put (1,2,3) into V, we must also put 
in all the scalar multiples of (1,2,3): (2,4,6), (3,6,9), (—1, —2, —3), 
(0,0,0), etc. In other words, 


V D {a(1, 2,3): a€ R}. 


Conversely, the space {a(1,2,3) : a € R} is indeed a subspace of R® 
which contains (1, 2,3). (Exercise!). This space is the one-dimensional 
space which consists of the line going through the origin and (1, 2,3). 


22 


e To summarize what we’ve seen so far, if one wants to find a subspace 
V which contains a specified vector v, then it is not enough to contain 
v; one must also contain the vectors av for all scalars a. As we shall see 
later, the set {av : a € R} will be called the span of v, and is denoted 


span({v}). 


e Now let’s suppose we have two vectors, v := (1,2,3) and w := (0,0,1), 
and we want to construct a vector space V in R? which contains both 
v and w. Again, setting V equal to all of R® will work, but let’s try to 
get away with as small a space V as we can. 


e We know that at a bare minimum, V has to contain not just v and w, 
but also the scalar multiples av and bw of v and w, where a and b are 
scalars. But V must also be closed under vector addition, so it must 
also contain vectors such as av+bw. For instance, V must contain such 
vectors as 


3u + 5w = 3(1,2,3) + 5(0,0, 1) = (3,6,9) + (0, 0,5) = (3,6, 14). 


We call a vector of the form av + bw a linear combination of v and w, 
thus (3, 6, 14) is a linear combination of (1, 2,3) and (0,0, 1). The space 
{av + bw : a,b € R} of all linear combinations of v and w is called the 
span of v and w, and is denoted span({v, w}). It is also a subspace of 
R®; it turns out to be the plane through the origin that contains both 
v and w. 


e More generally, we define the notions of linear combination and span 
as follows. 


e Definition. Let S be a collection of vectors in a vector space V (either 
finite or infinite). A linear combination of S is defined to be any vector 
in V of the form 

Q1VU, + AgVvg +... + GnUn 
where @1,...,@, are scalars (possibly zero or negative), and v1,...,Un 
are some elements in S. The span of S, denoted span(S), is defined to 
be the space of all linear combinations of S: 


span(S) := {ajvy + agvo +... + anVyp 2 a1,---,4n © Rj v,...,Vn € SH. 


23 


Usually we deal with the case when the set S' is just a finite collection 
SS {Wine ae0eF 
of vectors. In that case the span is just 
span({vi,...,Vn}) = {arvi + dove +... + anVn 2 a1,...,an € R}. 
(Why?) 


Occasionally we will need to deal when S is empty. In this case we set 
the span span(Q) of the empty set to just be {0}, the zero vector space. 
(Thus 0 is the only vector which is a linear combination of an empty 
set of vectors. This is part of a larger mathematical convention, which 
states that any summation over an empty set should be zero, and every 
product over an empty set should be 1.) 


Here are some basic properties of span. 


Theorem. Let S be a subset of a vector space V. Then span(S) is a 
subspace of V which contains S as a subset. Moreover, any subspace 
of V which contains S as a subset must in fact contain all of span(S). 


We shall prove this particular theorem in detail to illustrate how to go 
about giving a proof of a theorem such as this. In later theorems we 
will skim over the proofs more quickly. 


Proof. If S is empty then this theorem is trivial (in fact, it is rather 
vacuous - it says that the space {0} contains all the elements of an 
empty set of vectors, and that any subspace of V which contains the 
elements of an empty set of vectors, must also contain {0}), so we shall 
assume that n > 1. We now break up the theorem into its various 
components. 


(a) First we check that span(S) is a subspace of V. To do this we need 
to check three things: that span(S) is contained in V; that it is closed 
under addition; and that it is closed under scalar multiplication. 


(a.1) To check that span(S) is contained in V, we need to take a typical 
element of the span, say a,v; +...+@nUn, where aj,...,@p are scalars 
and v1,...,Un € S, and verify that it is in V. But this is clear since 


24 


U1,--+,Un were already in V and V is closed under addition and scalar 
multiplication. 


(a.2) To check that the space span(S) is closed under vector addition, 
we take two typical elements of this space, say ajv, +... + AnUy and 
bjvy +... + bpUn, Where the a; and 6; are scalars and v; € S for 7 = 
1,...n, and verify that their sum is also in span(S). But the sum is 


(avy +... + GnU¥n) + (dyv1, +... + bp Un) 


which can be rearranged as 


(ay + by)uy +... + (Qn + bn) Un 


[Exercise: which of the vector space axioms I-VIII were needed in order 
to do this?]. But since a; + b1,...,@n + bp are all scalars, we see that 
this is indeed in span(S). 

(a.3) To check that the space span(S) is closed under vector addition, 
we take a typical element of this space, say a;v;+...@y,Un, and a typical 
scalar c. We want to verify that the scalar product 


C(ayv, +... + AnUn) 
is also in span({v1,...,Vn}). But this can be rearranged as 
(cay)u1 +... + (Can)Un 
(which axioms were used here?). Since ca;,...,ca, were scalars, we see 


that we are in span(S) as desired. 


(b) Now we check that span(S) contains S. It will suffice of course to 
show that span(S) contains v for each v € S. But each v is clearly a 
linear combination of elements in S, in fact v = 1.v and v € S. Thus v 
lies in span(S) as desired. 


(c) Now we check that every subspace of V which contains S, also 
contains span(S). In order to stop from always referring to “that sub- 
space”, let us use W to denote a typical subspace of V which contains 
S. Our goal is to show that W contains span(S). 


This the same as saying that every element of span(S) lies in W. So, 
let uv = a1U; +...+4nU, be a typical element of span(S), where the a; 


25 


are scalars and v; € S for 7 = 1,...,n. Our goal is to show that v lies 
in W. 

Since v, lies in W, and W is closed under scalar multiplication, we see 
that a,v, lies in W. Similarly agvo,...,@nUn lie in W. But W is closed 
under vector addition, thus ajv,+...+@nUp lies in W, as desired. This 
concludes the proof of the Theorem. 


We remark that the span of a set of vectors does not depend on what 
order we list the set S: for instance, span({u,v,w}) is the same as 
span({w, v, u}). (Why is this?) 


The span of a set of vectors comes up often in applications, when one 
has a certain number of “moves” available in a system, and one wants 
to see what options are available by combining these moves. We give 
a example, from a simple economic model, as follows. 


Suppose you run a car company, which uses some basic raw materials 
- let’s say money, labor, metal, for sake of argument - to produce some 
cars. At any given point in time, your resources might consist of x 
units of money, y units of labor (measured, say, in man-hours), z units 
of metal, and w units of cars, which we represent by a vector (2, y, Z, w). 


Now you can make various decisions to alter your balance of resources. 
For instance, suppose you could purchase a unit of metal for two units 
of money - this amounts to adding (—2,0, 1,0) to your resource vector. 
You could do this repeatedly, thus adding a(—2, 0, 1,0) to your resource 
vector for any positive a. (If you could also sell a unit of metal for two 
units of money, then a could also be negative. Of course, a can always 
be zero, simply by refusing to buy or sell any metal). Similarly, one 
might be able to purchase a unit of labor for three units of money, thus 
adding (—3,1,0,0) to your resource vector. Finally, to produce a car 
requires 4 units of labor and 5 units of metal, thus adding (0, —4, —5, 1) 
to your resource vector. (This is of course an extremely oversimplified 
model, but will serve to illustrate the point). 


Now we ask the question of how much money it will cost to create a 
car - in other words, for what price x can we add (—z,0,0,1) to our 


26 


resource vector? The answer is 22, because 
(=22:0; 0,1) = 5(=2;0, 1,0) +4(—3, 1,0, 0) £10, =4, —5;1) 


and so one can convert 22 units of money to one car by buying 5 units 
of metal, 4 units of labor, and producing one car. On the other hand, 
it is not possible to obtain a car for a smaller amount of money using 
the moves available (why?). In other words, (—22,0,0,1) is the unique 
vector of the form (—2,0,0,1) which lies in the span of the vectors 
(—2,0,1,0), (—3,1,0,0), and (0, —4, —5, 1). 


Of course, the above example was so simple that we could have worked 
out the price of a car directly. But in more complicated situations 
(where there aren’t so many zeroes in the vector entries) one really has 
to start computing the span of various vectors. [Actually, things get 
more complicated than this because in real life there are often other 
constraints. For instance, one may be able to buy labor for money, but 
one cannot sell labor to get the money back - so the scalar in front of 
(—3,1,0,0) can be positive but not negative. Or storage constraints 
might limit how much metal can be purchased at a time, etc. This 
passes us from linear algebra to the more complicated theory of linear 
programming, which is beyond the scope of this course. Also, due to 
such things as the law of diminishing returns and the law of economies 
of scale, in real life situations are not quite as linear as presented in 
this simple model. This leads us eventually to non-linear optimization 
and control theory, which is again beyond the scope of this course.| 


This leads us to ask the following question: How can we tell when one 
given vector v is in the span of some other vectors v1, V2,...Un? For 
instance, is the vector (0, 1,2) in the span of (1,1, 1), (3,2,1), (1,0,1)? 
This is the same as asking for scalars a1, a2, a3 such that 

(0, if 2) = a,(1, i, i) ate a2(3, 2, 1) ae ag(1, 0, 1). 
We can multiply out the left-hand side as 


(a1 + 3a2 + a3, a1 + 2a2, ay + a2 + a3) 


27 


and so we are asking to find aj, a2, a3 that solve the equations 


at +3a2 +a3 = 0 
ay +2a2 = 
a, 1ag a3 = 2: 


This is a linear system of equations, “system” because it consists of 
more than one equation, and “linear” because the variables a,, a2, a3 
only appear as linear factors (as opposed to quadratic factors such as 
a? or a3, or more non-linear factors such as sin(a;)). Such a system 
can also be written in matrix form 


1 3 1 ay 0 
1 2 0 ag = 1 
Tet “a a3 y) 
or schematically as 
ils cae 00 
12 0] 1 
111i 2 


To actually solve this system of equations and find aj,, a2, a3, one of 
the best methods is to use Gaussian elimination. The idea of Gaussian 
elimination is to try to make as many as possible of the numbers in 
the matrix equal to zero, as this will make the linear system easier to 
solve. There are three basic moves: 


Swap two rows: Since it does not matter which order we display the 
equations of a system, we are free to swap any two rows of the sys- 
tem. This is mostly a cosmetic move, useful in making the system look 
prettier. 


Multiply a row by a constant: We can multiply (or divide) both sides 
of an equation by any constant (although we want to avoid multiplying 
a row by 0, as that reduces that equation to the trivial 0=0, and the 
operation cannot be reversed since division by 0 is illegal). This is 
again a mostly cosmetic move, useful for setting one of the co-efficients 
in the matrix to 1. 


28 


e Subtract a multiple of one row from another: This is the main move. 
One can take any row, multiply it by any scalar, and subtract (or 
add) the resulting object from a second row; the original row remains 
unchanged. The main purpose of this is to set one or more of the matrix 
entries of the second row to zero. 


We illustrate these moves with the above system. We could use the 
matrix form or the schematic form, but we shall stick with the linear 
system form for now: 


ay 3a2 a3 = 0 
ay, +2a9 =] 
ay a2 az =2. 


We now start zeroing the a, entries by subtracting the first row from 
the second: 


ay 302 ag 0 = 0 
—ag —a30CF= 1 
ay a2 a30 = 2 


and also subtracting the first row from the third: 


ay +3a2 +a3 = 
ag —a30C«= 
—2a9 = 


The third row looks simplifiable, so we swap it up 


ay +3a, +a3; =O 
—2ay =2 
—ag —a3 °F 1 


and then divide it by -2: 


ay+ 3a2 +a3 = 0 
ag = —] 
—ag —a3 = 1. 
Then we can zero the ay entries by subtracting 3 copies of the second 
row from the first, and adding one copy of the second row to the third: 


ay +a3 = 3 
ag = —] 


29 


If we then multiply the third row by —1 and then subtract it from the 
first, we obtain 


ay =3 
ag =-—] 
ag = 0 
and so we have found the solution, namely a; = 3, a2 = —1, a3 = 0. 


Getting back to our original problem, we have indeed found that (0, 1, 2) 
is in the span of (1, 1,1), (8,2, 1), (1,0, 1): 


(0, 1,2) = 3(1,1, 1) + (-1)(3, 2, 1) + 0(1,0, 1). 


In the above case we found that there was only one solution for ay, 
a2, a3 - they were exactly determined by the linear system. Sometimes 
there can be more than one solution to a linear system, in which case 
we say that the system is under-determined - there are not enough 
equations to pin down all the variables exactly. This usually happens 
when the number of unknowns exceeds the number of equations. For 
instance, suppose we wanted to show that (0, 1,2) is in the span of the 
four vectors (1,1, 1), (8,2,1), (1,0, 1), (0,0, 1): 


(O91,2) = ays 1) 4503039) 1) 4.05 105 1) a4 (0,,0,1), 


This is the system 


ay 32 aZ3 = 0 
ay +2a2 =] 
ay a2 a3 +a, =2. 


Now we do Gaussian elimination again. Subtracting the first row from 
the second and third: 


ay +3a9 +43 = 0 
—ag —a3 =] 
—2ay +a4 = 2. 


Multiplying the second row by —1, then eliminating a2 from the first 


and third rows: 
ay —2a3 =3 


At this stage the system is in reduced normal form, which means that, 
starting from the bottom row and moving upwards, each equation intro- 
duces at least one new variable (ignoring any rows which have collapsed 
to something trivial like 0 = 0). Once one is in reduced normal form, 
there isn’t much more simplification one can do. In this case there is 
no unique solution; one can set a4 to be arbitrary. The third equation 
then allows us to write a3 in terms of a4: 


a3 = —a4/2 


while the second equation then allows us to write a2 in terms of a3 (and 
thus of a4: 
ag =-—1 —- a3 = —l + a4/2. 


Similarly we can write a, in terms of ag: 
ay = 3+ 2a3 = 3-— a4. 


Thus the general way to write (0, 1,2) as a linear combination of (1, 1, 1), 
(352, 1),( 1.0.1), 00, 051 sis 


(0,1, 2) = (8—a4)(1, 1, 1)+(—1+4a4/2) (3, 2, 1)+(—a4/2)(1, 0, 1)+a4(0, 0, 1); 
for instance, setting a, = 4, we have 

(0, 1,2) = —(1,1,1) + (8, 2,1) — 2(1, 0,1) + 4(0,0, 1) 
while if we set a, = 0, then we have 

(0, 1,2) = 3(1, 1,1) — 1(8, 2, 1) + 0(1, 0,1) + 0(0,0, 1) 


as before. Thus not only is (0,1,2) in the span of (1,1,1), (3,2,1), 
(1,0, 1), and (0,0,1), it can be written as a linear combination of such 
vectors in many ways. This is because some of the vectors in this set are 
redundant - as we already saw, we only needed the first three vectors 
(1,1,1), (8,2,1) and (1,0,1) to generate (0,1,2); the fourth vector 
(0,0,1) was not necessary. As we shall see, this is because the four 
vectors (1,1,1), (3,2,1), (1,0,1), and (0,0,1) are linearly dependent. 
More on this later. 


ol 


e Of course, sometimes a vector will not be in the span of other vectors 
at all. For instance, (0, 1,2) is not in the span of (3, 2,1) and (1,0, 1). 
If one were to try to solve the system 


(0, 1,2) = a(3, 2, 1) + a2(1, 0, 1) 


one would be solving the system 


30,+ ag = 0 
2a, = 1 
ay+ ag = 2. 


If one swapped the first and second rows, then divided the first by two, 
one obtains 


ay =D 
ag = —3/2 
ag = 3/2 


ay = 192 


Thus there is no solution, and (0, 1,2) is not in the span. 


OK OK OK 


Spanning sets 


e Definition. A set S is said to span a vector space V if span(S) = V; 
i.e. every vector in V is generated as a linear combination of elements 
of S. We call S a spanning set for V. (Sometimes one uses the verb 
“venerated” instead of “spanned”, thus V is generated by S and S' isa 
generating set for V.) 


32 


e A model example of a spanning set is the set {(1, 0,0), (0, 1,0), (0,0, 1)} 
in R®; every vector in R? can clearly be written as a linear combination 
of these three vectors, e.g. 


(3,7, 13) = 3(1, 0,0) + 7(0, 1,0) + 13(0,0, 1). 


There are of course similar examples for other vector spaces. For in- 
stance, the set {1,x,2?,x*} spans P3(R) (why?). 


e One can always add additional vectors to a spanning set and still get a 
spanning set. For instance, the set {(1, 0,0), (0, 1,0), (0,0, 1), (9, 14, 23), (15, 24, 99)} 
is also a spanning set for R®, for instance 


(3, 7, 13) = 3(1, 0, 0) +7(0, 1,0) +13(0, 0, 1) +0(9, 14, 23) + 0(15, 24, 99). 


Of course the last two vectors are not playing any significant role here, 
and are just along for the ride. A more extreme example: every vector 
space V is a spanning set for itself, span(V) = V. 


e On the other hand, removing elements from a spanning set can cause it 
to stop spanning. For instance, the two-element set {(1, 0,0), (0,1,0)} 
does not span, because there is no way to write (3,7, 13) (for instance) 
as a linear combination of (1,0,0), (0, 1,0). 


e Spanning sets are useful because they allow one to describe all the vec- 
tors in a space V in terms of a much smaller space S. For instance, 
the set S := {(1,0,0), (0, 1,0), (0,0,1)} only consists of three vectors, 
whereas the space R® which S spans consists of infinitely many vec- 
tors. Thus, in principle, in order to understand the infinitely many 
vectors R3, one only needs to understand the three vectors in S' (and 
to understand what linear combinations are). 


e However, as we see from the above examples, spanning sets can contain 
“Junk” vectors which are not actually needed to span the set. Such junk 
occurs when the set is linearly dependent. We would like to now remove 
such junk from the spanning sets and create a “minimal” spanning set 
- a set whose elements are all linearly independent. Such a set is known 
as a basis. In the rest of this series of lecture notes we discuss these 
related concepts of linear dependence, linear independence, and being 
a basis. 


33 


1K OK OK OK OK 
Linear dependence and independence 
e Consider the following three vectors in R?: v; := (1, 2,3), ve := (1,1, 1), 
v3 := (3,5,7). As we now know, the span span({vj, vo, v3}) of this set 
is just the set of all linear combinations of v1, v2, v3: 
span({vj, Vo, v3}) = {arvi + agVe + a3v3 : a1, a2, a3 € R}. 
Thus, for instance 3(1, 2,3) + 4(1, 1,1) + 1(3,5, 7) = (10, 15, 20) lies in 
the span. However, the (3,5, 7) vector is redundant because it can be 
written in terms of the other two: 
v3 = (3,5, 7) = 2(1, 2,3) + (1, 1,1) = 201 + v2 
or more symmetrically 
20; + v2 — v3 =0. 


Thus any linear combination of v1, v2, v3 is in fact just a linear combi- 
nation of v, and vo: 


A1V1 +42V2 +43VU3 = aU, +a2V2+a3(2v1 +02) = (ay +2a3)v1+(a2+a3)v2. 


e Because of this redundancy, we say that the vectors v1, V2, v3 are linearly 
dependent. More generally, we say that any collection S of vectors in a 
vector space V are linearly dependent if we can find distinct elements 
U1,--.,Un € S, and scalars ay,...,@,, not all equal to zero, such that 


Q1V1 + AqVo +... + 4nUn = 0. 


e (Of course, 0 can always be written as a linear combination of v,..., Un 
in a trivial way: 0 = 0v,;+...+0v,. Linear dependence means that this 
is not the only way to write 0 as a linear combination, that there exists 
at least one non-trivial way to do so). We need the condition that the 


U1,--+,Un are distinct to avoid silly things such as 2v, + (—2)v; = 0. 

e In the case where S is a finite set S = {v,...,Un}, then S is linearly 
dependent if and only if we can find scalars a;,...,@, not all zero such 
that 


Q{Uy +... + GnUn, = 0. 


(Why is the same as the previous definition? It’s a little subtle). 


34 


e Ifa collection of vectors S' is not linearly dependent, then they are said 
to be linearly independent. An example is the set {(1, 2,3), (0, 1, 2)}; it 
is not possible to find a,, a2, not both zero for which 


ai(1,2, 3) + a2(0, 1,2) =0, 


because this would imply 


which can easily be seen to only be true if a; and ag are both 0. Thus 
there is no non-trivial way to write the zero vector 0 = (0,0,0) asa 
linear combination of (1,2,3) and (0,1, 2). 


e By convention, an empty set of vectors (with n = 0) is always linearly 
independent (why is this consistent with the definition?) 


e As indicated above, if a set is linearly dependent, then we can remove 
one of the elements from it without affecting the span. 


e Theorem. Let S be a subset of a vector space V. If S is linearly 
dependent, then there exists an element v of S' such that the smaller 
set S — {v} has the same span as S: 


span(S — {v}) = span(S). 


Conversely, if S is linearly independent, then every proper subset S” € 
S of S will span a strictly smaller set than S: 


span(S’) ¢ span(S). 


e Proof. Let’s prove the first claim: if S is a linearly dependent subset 
of V, then we can find v € S such that span(S — {v}) = span(S). 


e Since S is linearly dependent, then by definition there exists distinct 
U],-..,Un and scalars a1,...,@,, not all zero, such that 


Q1Vi +... + anUyn = 0. 


39 


We know that at least one of the a; are non-zero; without loss of gen- 
erality we may assume that a, is non-zero (since otherwise we can just 
shuffle the v; to bring the non-zero coefficient out to the front). We 
can then solve for v, by dividing by ay: 


Thus any expression involving v; can instead be written to involve 


V2,...,Un instead. Thus any linear combination of v; and other vectors 
in S not equal to v,; can be rewritten instead as a linear combination 
of vg,..., Un and other vectors in S not equal to v;. Thus every linear 


combination of vectors in S' can in fact be written as a linear combina- 
tion of vectors in S—{v,}. On the other hand, every linear combination 
of S — {v;} is trivially also a linear combination of S. Thus we have 
span(S) = span(S — {v,}) as desired. 


Now we prove the other direction. Suppose that S C V is linearly 
independent. And le S’ € S be a proper subset of S. Since every 
linear combination of 9’ is trivially a linear combination of S, we have 
that span(S’) C span(S). So now we just need argue why span(S’) 4 
span(S). 

Let v be an element of S which is not contained in 5S’; such an element 
must exist because S” is a proper subset of S. Since v € S, we have 
uv € span(S). Now suppose that v were also in span(S’). This would 
mean that there existed vectors v1,...,Un € 5S” (which in particular 
were distinct from v) such that 


VU = AV, + A202 +... + GnUn, 


or in other words 


(—1)u +. ay01 + agua +... + GnUn = 0. 


But this is a non-trivial linear combination of vectors in S which sum to 
zero (it’s nontrivial because of the —1 coefficient of v). This contradicts 
the assumption that S is linearly independent. Thus v cannot possibly 
be in span(S’). But this means that span(S’) and span(S) are different, 
and we are done. 


36 


KOK OK OK 


Bases 


e A basis of a vector space V is a set S which spans V, while also being 
linearly independent. In other words, a basis consists of a bare mini- 
mum number of vectors needed to span all of V; remove one of them, 
and you fail to span V. 


e Thus the set {(1,0,0), (0,1, 0), (0,0,1)} is a basis for R?, because it 
both spans and is linearly independent. The set {(1, 0,0), (0,1, 0), (0,0, 1), (9, 14, 23) } 
still spans R®, but is not linearly independent and so is not a basis. 
The set {(1,0,0), (0,1,0)} is linearly independent, but does not span 
all of R? so is not a basis. Finally, the set {(1, 0,0), (2,0,0)} is neither 
linearly independent nor spanning, so is definitely not a basis. 


e Similarly, the set {1, 7, x”, r°} is a basis for P3(R.), while the set {1, 2, 1+ 
x, x, x? + x, x°} is not (it still spans, but is linearly dependent). The 
set {1,2 + 2”, x°} is linearly independent, but doesn’t span. 


e One can use a basis to represent a vector in a unique way as a collection 
of numbers: 


e Lemma. Let {v1,v2,...,Un} be a basis for a vector space V. Then 
every vector in v can be written uniquely in the form 


V=4,Uy~+...+ AnUn 
for some scalars aj,..., Qn. 


e Proof. Because {v1,...,Un} is a basis, it must span V, and so every 
vector v in V can be written in the form a,v,+...+a,U,. It only remains 
to show why this representation is unique. Suppose for contradiction 
that a vector v had two different representations 


VU = a,U41 mais AnUn 
v= b1U4 tars bnUn 
where @1,...,@, are one set of scalars, and b),...,b, are a different set 


of scalars. Subtracting the two equations we get 


(ay — by)uy +... + (An — bn)Un = 0. 


37 


But the v1, ..., Uy are linearly independent, since they are a basis. Thus 
the only representation of 0 as a linear combination of v1,..., Un is the 
trivial representation, which means that the scalars a; — by,...,@, — by 
must be equal. That means that the two representations ajv; +...+ 
GnUn, 010; +... + bpVyn must in fact be the same representation. Thus 
v cannot have two distinct representations, and so we have a unique 
representation as desired. 


As an example, let v; and v2 denote the vectors v; := (1,1) and v2 := 
(1,—1) in R’. One can check that these two vectors span R? and are 
linearly independent, and so they form a basis. Any typical element, 
e.g. (3,5), can be written uniquely in terms of v, and vo: 


G5) S40 AO Ss = ap, 


In principle, we could write all vectors in R? this way, but it would 
be a rather non-standard way to do so, because this basis is rather 
non-standard. Fortunately, most vector spaces have “standard” bases 
which we use to represent them: 


The standard basis of R” is {e1, e2,..., en}, where e; is the vector whose 
j*" entry is 1 and all the others are 0. Thus for instance, the standard 
basis of R® consists of e; := (1,0,0), e2 := (0,1,0), and eg := (0,0, 1). 


The standard basis of the space P,,(R) is {1,2,27,...,2"}. The stan- 
dard basis of P(R) is the infinite set {1,2,x7,...}. 


One can concoct similar standard bases for matrix spaces Minyxn(R) 
(just take those matrices with a single coefficient 1 and all the others 
zero). However, there are other spaces (such as C(R)) which do not 
have a reasonable standard basis. 


38 


Math 115A - Week 2 
Textbook sections: 1.6-2.1 
Topics covered: 


e Properties of bases 

e Dimension of vector spaces 
e Lagrange interpolation 

e Linear transformations 


KOK OK OK Ok 


Review of bases 


e In last week’s notes, we had just defined the concept of a basis. Just 
to quickly review the relevant definitions: 


e Let V be a vector space, and S' be a subset of V. The span of S is the 
set of all linear combinations of elements in 5S; this space is denoted 
span(S) and is a subspace of V. If span(S) is in fact equal to V, we say 
that S' spans V. 


e We say that S is linearly dependent if there is some non-trivial way to 
write 0 as a linear combination of elements of S. Otherwise we say that 
S is linearly independent. 


e We say that S' is a basis for V if it spans V and is also linearly inde- 
pendent. 


e Generally speaking, the larger the set is, the more likely it is to span, 
but also the less likely it is to remain linearly independent. In some 
sense, bases form the boundary between the “large” sets which span 
but are not independent, and the “small” sets which are independent 
but do not span. 


* OK OK OK 


Examples of bases 


39 


e Why are bases useful? One reason is that they give a compact way to 
describe vector spaces. For instance, one can describe R® as the vector 
space spanned by the basis {(1, 0,0), (0, 1,0), (0,0, 1)} : 


R® = span({(1, 0,0), (0, 1,0), (0,0, 1)}). 


In other words, the three vectors (1,0,0), (0,1,0), (0,0, 1) are linearly 
independent, and R? is precisely the set of all vectors which can be 
written as linear combinations of (1,0,0), (0,1,0), and (0,0, 1). 


e Similarly, one can describe P(R) as the vector space spanned by the ba- 
sis {1, 2,27, 2°,...}. Or Peven(R), the vector space of even polynomials, 
is the vector space spanned by the basis {1, x”, 24, 7°, ...} (why?). 


e Now for a more complicated example. Consider the space 
V :={(2,y,z) ER? :at+ytz=0}; 


in other words, V consists of all the elements in R? whose co-ordinates 
sum to zero. Thus for instance (3,5,—8) lies in V, but (3,5, —7) does 
not. The space V describes a plane in R?°; if you remember your Math 
32A, you'll recall that this is the plane through the origin which is 
perpendicular to the vector (1,1, 1). It is a subspace of R?, because it 
is closed under vector addition and scalar multiplication (why’?). 


e Now let’s try to find a basis for this space. A straightforward, but slow, 
procedure for doing so is to try to build a basis one vector at a time: 
we put one vector in V into the (potential) basis, and see if it spans. 
If it doesn’t, we throw another (linearly independent) vector into the 
basis, and then see if it spans. We keep repeating this process until 
eventually we get a linearly independent set spanning the entire space 
- i.e. a basis. (Every time one adds more vectors to a set S, the span 
span(S) must get larger (or at least stay the same size) - why?). 


e To begin this algorithm, let’s pick an element of the space V. We can’t 
pick 0 - any set with 0 is automatically linearly dependent (why?), but 
there are other, fairly simple vectors in V; let’s pick v, := (1,0,—1). 
This vector is in V, but it doesn’t span V: the linear combinations of v1 
are all of the form (a,0,—a), where a € Ris a scalar, but this doesn’t 


40 


include all the vectors in V. For instance, v2 := (1, —1,0) is clearly not 
in the span of v;. So now we take both v; and vz and see if they span. 
A typical linear combination of v; and vo is 


QV, + AQv2 = ax(1, 0, =) za a2(1, =, 0) — (ay = a2, —Q2, —4a1) 


and so the question we are asking is: can every element (x, y, z) of V be 
written in the form (a; + a2, —a@2,—a,)? In other words, can we solve 
the system 


ay +d. =2 
—ay =z 


for every (x,y,z) € V? Well, one can solve for a; and ag as 
ay i= —Z, aq = —Y. 


The first equation then becomes —z— y = x, but this equation is valid 
because we are assuming that (x,y,z) € V, so that r+ y+z = 0. 
(This is not all that of a surprising co-incidence: the vectors v; and 
V2 were chosen to be in V, which explains why the linear combination 
a 1V;+a2v2 must also be in V). Thus every vector in V can be written as 
a linear combination of v, and vp. Also, these two vectors are linearly 
independent (why?), and so {v, v2} = {(1,0, —1), (1, -1,0)} is a basis 
for V. 


It is clear from the above that this is not the only basis available for V; 
for instance, {(1,0,—1), (0,1, —1)} is also a basis. In fact, as it turns 
out, any two linearly independent vectors in V can be used to form 
a basis for V. Because of this, we say that V is two-dimensional. It 
turns out (and this is actually a rather deep fact) that many of the 
vector spaces V we will deal with have some finite dimension d, which 
means that any d linearly independent vectors in V automatically form 
a basis; more on this later. 


A philosophical point: we now see that there are (at least) two ways 
to construct vector spaces. One is to start with a “big” vector space, 
say R°, and then impose constraints such as x + y + z = 0 to cut the 
vector space down in size to obtain the target vector space, in this case 


Al 


V. An opposing way to make vector spaces is to start with nothing, 
and throw in vectors one at a time (in this case, v; and v2) to build 
up to the target vector space (which is also V). A basis embodies this 
second, “bottom-up” philosophy. 


KOK OK OK Ok 


Rigorous treatment of bases 


Having looked at some examples of how to construct bases, let us now 
introduce some theory to make the above algorithm rigorous. 


Theorem 1. Let V be a vector space, and let S be a linearly indepen- 
dent subset of V. Let v be a vector which does not lie in S. 


(a) If v lies in span(S), then SU{v} is linearly dependent, and span(SU 
{v}) = span(). 


(b) If v does not lie in span(S), then S U {v} is linearly independent, 
and span(S U {v}) 2 span(S). 


This theorem justifies our previous reasoning: if a linearly independent 
set S' does not span V, then one can make the span bigger by adding 
a vector outside of span(S); this will also keep S' linearly independent. 


Proof We first prove (a). If v lies in span(S), then by definition of 
span, v must be a linear combination of S, i.e. there exists vectors 
U1,---,Un in S and scalars aj,...,@p, such that 


V =A, +... $AnUn 


and thus 
0 = (-Ll)ut ayy +... + anUn. 


Thus 0 is a non-trivial linear combination of v, v1,...,Un (it is non- 
trivial because the co-efficient —1 in front of v is non-zero. Note that 
since v ¢ S, this coefficient cannot be cancelled by any of the v;). 
Thus S U {v} is linearly dependent. Furthermore, since v is a linear 
combination of v1,...,Un, any linear combination of v and vj,...,Un 
can be re-expressed as a linear combination just of v1,...,Un (why?). 
Thus span(SU {v}) does not contain any additional elements which are 


42 


not already in span(S). On the other hand, every element in span(S) 
is clearly also in span(S U {v}). Thus span(S U {v}) and span(S) have 
precisely the same set of elements, i.e. span(S U {v}) = span(S). 


e Now we prove (b). Suppose v ¢ span(S). Clearly span(SU{v}) contains 
span(S), since every linear combination of S is automatically a linear 
combination of S U {v}. But span(S U {v}) clearly also contains v, 
which is not in span(S). Thus span(S U {v}) 2 span(S). 


e Now we prove that SU{v} is linearly independent. Suppose for contra- 
diction that SU {v} was linearly dependent. This means that there is 
some non-trivial way to write 0 as a linear combination of v and some 
vectors V1,...,Un in S: 


0 = av + ayvy +... + AnUn. 


If a were zero, then we would be writing 0 as a non-trivial linear combi- 
nation of elements v1,...,U, in S, but this contradicts the hypothesis 
that S is linearly independent. Thus a is non-zero. But then we may 
divide by a and conclude that 


Jur+...4 (eo 


ay 
o a )Un, 


v=( 


a 


so that v is a linear combination of v1,...,Un, So it is in the span of S, 
a contradiction. Thus S U {v} is linearly independent. 


KOK OK OK Ok 


Dimension 


e As we saw in previous examples, a vector space may have several 
bases. For instance, if V := {(2,y,z) € R?: c+y+z = O}, then 
{(1,0, -1), (1, —1,0)} is a basis, but so is {(1,0,—1), (0,1, —1)}. 


e If V was the line {(t,t,t) : t € R}, then {(1,1,1)} is a basis, but 
so is {(2,2,2)}. (On the other hand, {(1, 1,1), (2,2, 2)} is not a basis 
because it is linearly dependent). 


e If V was the zero vector space {0}, then the empty set {} is a basis 
(why?), but {0} is not (why?). 


43 


e In R®, the three vectors {(1,0,0), (0,1, 0), (0,0,1)} form a basis, and 
there are many other examples of three vectors which form a basis in 
R? (for instance, {(1, 1,0), (1, -1, 0), (0,0, 1)}. As we shall see, any set 
of two or fewer vectors cannot be a basis for R? because they cannot 
span all of R®, while any set of four or more vectors cannot be a basis 
for R® because they become linearly dependent. 


e One thing that one sees from these examples is that all the bases of a 
vector space seem to contain the same number of vectors. For instance, 
R? always seems to need exactly three vectors to make a basis, and so 
forth. The reason for this is in fact rather deep, and we will now give the 
proof. The first step is to prove the following rather technical result, 
which says that one can “edit” a spanning set by inserting a fixed 
linearly independent set, while removing an equal number of vectors 
from the previous spanning set. 


e Replacement Theorem. Let V be a vector space, and let S be a 
finite subset of V which spans V (i.e. span(S) = V). Suppose that S$ 
has exactly n elements. Now let L be another finite subset of V which is 
linearly independent and has exactly m elements. Then m is less than 
or equal to n. Furthermore, we can find a subset S’ of S containing 
exactly n — m elements such that S’U ZL also spans V. 


e This theorem is not by itself particularly interesting, but we can use it 
to imply a more interesting Corollary, below. 


e Proof We induct on m. The base case is m = 0. Here it is obvious 
that n > m. Also, if we just set S’ equal to S, then S’ has exactly 
n—m elements, and S’ U L is equal to S (since L is empty) and so 
obviously spans V by hypothesis. 


e Now suppose inductively that m > 0, and that we have already proven 
the theorem for m—1. Since L has m elements, we may write it as 


PS Aine ea hs 
Since {v1,...,Um} is linearly independent, the set L := {v1,...,Um—1} 


is also linearly independent (why?). We can now apply the induction 
hypothesis with m replaced by m — 1 and L replaced by L. This tells 


44 


us that n > m— 1, and also there is some subset S’ of S with exactly 
n —m-+1 elements, such that S’ UL spans V. 


Write S$’ = {w1,.--,Wn—-m+i}. To prove that n > m, we have to exclude 
the possibility that n = m— 1. We do this as follows. Consider the 
vector U,,, Which is in L but not in L. Since the set 


xy & 
SUL= {U1, +05 Um—-1,UW1,--- ,Wn—-mtit 


spans V, we can write v,, as a linear combination of S$’ U L. In other 
words, we have 


Um = Q1V, +... + Am—1Um_—1 4 byw Pere eset Oy 1 Wnt (0.2) 
for some scalars @1,...,@m—1,01,.--, On—m41- 


Suppose for contradiction that n = m—1. Then S’ is empty, and there 
are no vectors W1,...,Wn—m+1- We thus have 


Um = 10, +... + Am_1Um-_-1 (0.3) 


so that 
0 Sai bees + Oger (Doe 


but this contradicts the hypothesis that {v1,...,Um} is linearly inde- 
pendent. Thus n cannot equal m — 1, and so must be greater than or 
equal to m. 


We now have n > m, so that there is at least one vector in wy, ..., Wn—m41- 
Since we know the set 


Ss! U iL = {V1, +05) Um—-1,U1,.-.-.- (Wit | 
spans V, it is clear that 
SUL = {uy,..., Um; Wi... Wn—m} 


also spans V (adding an element cannot decrease the span). To finish 
the proof we need to eliminate one of the vectors w,;, to cut S’ down to 
a set S’ of size n — m, while still making S’ UL span V. 


45 


e We first observe that the b),...,0n—-m+1 cannot all be zero, otherwise 


we would be back to equation (0.3) again, which leads to contradiction. 
So at least one of the b’s must be non-zero; since the order of the vectors 
w; is irrelevant, let’s say that b; is the one which is non-zero. But then 
we can divide by b;, and use (0.2) to solve for wy: 


ms 1 ay Qm-1 be Dried 
Wy = —Um — Vy — ... — U1 — Wg — 2 — m4: 
by by by by by 
Thus wy, is a linear combination of {v1,...,Um,W2,---;Wn—-m+i}. In 
other words, if we write S’ := {we,...,Wn—-m+i}, then wy is a linear 


combination of S$” U L. Thus by Theorem 1, 
span(S’ UL) = span(S’ ULU {wi}). 


But S’ULU {wy} is just S’ UL, which spans V. Thus $’U L spans V. 
Since S’ has exactly n — m elements, we are done. 


Corollary 1 Suppose that a vector space V contains a finite basis B 
which consists of exactly d elements. Then: 


(a) Any set S C V consisting of fewer than d elements cannot span 
V. (In other words, every spanning set of V must contain at least d 
elements). 


(b) Any set S Cc V consisting of more than d elements must be linearly 
dependent. (In other words, every linearly independent set in V can 
contain at most d elements). 


(c) Any basis of V must consist of exactly d elements. 
(d) Any spanning set of V with exactly d elements, forms a basis. 


(e) Any set of d linearly independent elements of V forms a basis. 
) 


(f) Any set of linearly independent elements of V is contained in a 
basis. 


(g) Any spanning set of V contains a basis. 


46 


Proof We first prove (a). Let S have d’ elements for some d’ < d. 
Suppose for contradiction that S spanned V. Since B is linearly in- 
dependent, we may apply the Replacement Theorem (with B playing 
the role of L) to conclude that d' > d, a contradiction. Thus S cannot 
span V. 


Now we prove (b). First suppose that S is finite, so that S has d’ 
elements for some d' > d. Suppose for contradiction that S' is linearly 
independent. Since B spans V, we can apply the Replacement theorem 
(with B playing the role of S, while S instead plays the role of L) to 
conclude that d > d’, a contradiction. So we’ve proven (b) when S 
is finite. When S' is infinite, we can find a finite subset 5’ of S with, 
say, d+ 1 elements; since we’ve already proven (b) for finite subsets, 
we know that S’ is linearly dependent. But this implies that S is also 
linearly dependent. 


Now we prove (c). Let B’ be any basis of V. Since B’ spans, it must 
contain at least d elements, by (a). Since B’ is linearly independent, it 
must contain at most d elements, by (b). Thus it must contain exactly 
d elements. 


Now we prove (d). Let S be a spanning set of V with exactly d ele- 
ments. To show that S is a basis, we need to show that S is linearly 
independent. Suppose for contradiction that S was linearly dependent. 
Then by a theorem in page 34 of last week’s notes, there exists a vector 
v in S such that span(S — {v}) = span(S). Thus 5 — {v} also spans V, 
but it has fewer than d elements, contradicting (a). Thus S must be 
linearly independent. 


Now we prove (e). Let L be a linearly independent set in V with exactly 
d elements. To show that L is a basis, we need to show that DL spans. 
Suppose for contradiction that L did not span, then there must be some 
vector v which is not in the span of L. But by Theorem 1 in this week’s 
notes, L U {v} is linearly independent. But this set has more than d 
elements, contradicting (b). Thus Z must span V. 


Now we prove (f). Let L be a linearly independent set in V; by (a), 
we know it has d’ elements for some d’ < d. Applying the Replacement 


AT 


theorem (with B playing the role of the spanning set S), we see that 
there is some subset S’ of B with d—d' elements such that S’UL spans 
V. Since 5S’ has d—d' elements and L has d’ elements, S’U L can have 
at most d elements; actually it must have exactly d, else it would not 
span by (a). But then by (d) it must be a basis. Thus L is contained 
in a basis. 


Now we prove (g). Let S be a spanning set in V. To build a basis 
inside S', we see by (e) that we just need to find d linearly independent 
vectors in S. Suppose for contradiction that we can only find at most 
d' linearly independent vectors in S for some d’ < d. Let v1,..., va be 
d' such linearly independent vectors in S. Then every other vector v in 
S must be a linear combination of vj,...,vg, otherwise we could add 
v to {v,,...,v¢} and obtain a larger collection of linearly independent 
vectors in S (see Theorem 1). But if every vector in S is a linear 
combination of v1,...,vq, and S' spans V, then v,,...,v@ must span 
V. By (a) this means that d’ > d, contradiction. Thus we must be able 
to find d linearly independent vectors in S, and so S contains a basis. 
O 


Definition We say that V has dimension d if it contains a basis of d 
elements (and so that all the consequences of the Corollary 1 follow). 
We say that V is finite-dimensional if it has dimension d for some finite 
number d, otherwise we say that V is infinite-dimensional. 


From Corollary 1 we see that all bases have the same number of ele- 
ments, so a vector space cannot have two different dimensions. (e.g. 
a vector space cannot be simultaneously two-dimensional and three- 
dimensional). We sometimes use dim(V) to denote the dimension of 
V. One can think of dim(V) as the number of degrees of freedom inher- 
ent in V (or equivalently, the number of possible linearly independent 
vectors in V). 


Example The vector space R? has a basis {(1, 0,0), (0, 1,0), (0,0, 1)}, 
and thus has dimension 3. Thus any three linearly independent vectors 
in R? will span R? and form a basis. 


Example The vector space P,,(R) of polynomials of degree < n has 
basis {1, x7, x?,...,v”} and thus has dimension n + 1. 


48 


e Example The zero vector space {0} has a basis {} and thus has di- 
mension zero. (It is the only vector space with dimension zero - why?) 


e Example The vector space P(R) of all polynomials is infinite dimen- 
sional. To see this, suppose for contradiction that it had some finite 
dimension d. But then one could not have more than d linearly inde- 
pendent elements. But the set {1,z,2?,...,v7} contains d+ 1 elements 
which are linearly independent (why?), contradiction. Thus P(R) is 
infinite dimensional. 


e As we have seen, every finite dimensional space has a basis. It is also 
true that infinite-dimensional spaces also have bases, but this is signif- 
icantly harder to prove and beyond the scope of this course. 


* OK OK OK 


Subspaces and dimension 


e We now prove an intuitively obvious statement about subspaces and 
dimension: 


e Theorem 2. Let V be a finite-dimensional vector space, and let W 
be a subspace of V. Then W is also finite-dimensional, and dim(W) < 
dim(V). Furthermore, the only way that dim(W) can equal dim(V) is 
iW = ¥. 


e Proof. We first construct a finite basis for W via the following algo- 
rithm. If W = {0}, then we can use the empty set as a basis. Now 
suppose that W 4 {0}. Then we can find a non-zero vector v; in W. 
If vy spans W, then we have found a basis for W. If vy does not span 
W, then we can find a vector v2 which does not lie in span({v,}); by 
Theorem 1, {v1, v2} is linearly independent. If this set spans W, then 
we can found a basis for W. Otherwise, we can find a vector v3 which 
does not lie in span({vi, v2}). By Theorem 1, {v1, v2, v3} is linearly in- 
dependent. We continue in this manner until we finally span W. Note 
that we must stop before we exceed dim(V) vectors, since from part (b) 
of the dimension theorem we cannot make a linearly independent set 
with more than dim(V) vectors. Thus this algorithm must eventually 
generate a basis of W which consists of at most dim(V) vectors, which 
implies that W is finite-dimensional with dim(W) < dim(V). 


49 


e Now suppose that dim(W) = dim(V). Then W has a basis B which 
consists of dim(V) estimates; B is of course linearly independent. But 
then by part (e) of Corollary 1, B is also a basis for V. Thus span(B) = 
V and span(B) = W, which implies that W = V as desired. 


* OK OK OK 


Lagrange interpolation 


e We now give an application of this abstract theory to a basic problem: 
how to fit a polynomial to a specified number of points. 


e Everyone knows that given two points in the plane, one can find a line 
joining them. A more precise way of saying this is that given two data 
points (71,41) and (x2, y2) in R?, with 2, # x2, then we can find a line 
y = mx + b which passes through both these points. (We need x, # x2 
otherwise the line will have infinite slope). 


e Now suppose we have three points (11, y1), (2, Y2), (#3, y3) in the plane, 
with x1, v2, x3 all distinct. Then one usually cannot fit a line which goes 
exactly through these three data points. (One can still do a best fit to 
these data points by a straight line, e.g. by using the least squares fit; 
this is an important topic but not one we will address now). However, 
it turns out that one can still fit a parabola y = ax? + br + € to these 
three points. With four points, one cannot always fit a parabola, but 
one can always fit a cubic. More generally: 


e Theorem 3 (Lagrange interpolation formula) Let n > 1, and 
let (21, Y1),--+;(@n; Yn) be n points in R? such that 71, 272,...,2%p are 
all distinct. Then there exists a unique polynomial f € Ppn(R) of 
degree < n — 1 such that the curve y = f(x) passes through all n 
points (71, Y1),---,(@n;Yn)- In other words, we have y; = f(x,;) for all 
j=1,...,n. Furthermore, f is given by the formula 


~ Thicncn:ngej(® — &x) 


f@= >, yi 


Gal ickcnstgy(@3 — Lk 


e The polynomial f is sometimes called the interpolating polynomial for 
the points (21, 41),---(%n; Yn); im some sense it is the simplest object 


50 


that can pass through all n points. These interpolating polynomials 
have several uses, for instance in taking a sequence of still images, and 
finding a smooth sequence of intermediate images to fit between these 
images. 


To prove this theorem, we first proceed by considering some simple 
examples. 


First suppose that y; = yo =... = Yn = 0. Then the choice of interpo- 
lating polynomial is obvious: just take the zero polynomial f(x) = 0. 


Now let’s take the next simplest case, when y, = 1 and yo = y3 =... = 
Yn = 0. The interpolating polynomial f that we need here must obey 


the conditions. ((a4)'= 1,-and’ f(s) =... = f@,_) =o 
Since f has zeroes at %2,...,2n, it must have factors of (x — x2), (x — 
23),.--,(%& — Xp). So it must look like 


FQ Oyt = fs). [r= Se): 


Since (4 — %2)...(@ — %,) has degree n — 1, and we want f to have 
degree at most n — 1, Q(x) must be constant, say Q(x) = c: 


f =c(a@—2@2)...("@ — Zn). 
To find out what c is, we use the extra fact that f(x) = 1, so 
1 =c(x| — £2)... (41 — Zn). 


Thus the interpolating polynomial is given by f;, where 


ins (x — X2)...(@ — Ln) 


(x1 — 2)...(41 — Ln) 


or equivalently 

hee [Ty-o(% — 2x) 

© [Tpe2(t1 — x) 

One can see by inspection that indeed f;(x,) is equal to 1, while 


fi(x2) SiS Titty) = 0. 


fi(z) 


51 


e Now consider the case when y; = 1 for some 1 < j < n, and y, = 0 
for all other k 4 j (the earlier case being the special case when j = 1). 
Then a similar argument gives that f must equal f;, where f; is the 
polynomial 

= eerenc — Xx) 

Thickcnikgg (3 — 2x) 


For instance, if n = 4 and 7 = 2, then 


fj(x) : 


(x — 21)(@ — %3)(x — 44) 
(a2 — £1)(a@2 — #3) (a2 — wa) 


fo(x) = 


e To summarize, for each 1 < j <n, we can find a polynomial f; € 
P,-1(R) such that f;(v;) = 1 and f;(z,) = 0 for k # j. Thus, for 


instance, when n = 4, then we have 


e To proceed further we need a key lemma. 
e Lemma 4. The set { fi, fo,..., fr} is a basis for P,-1(R). 


e Proof. We already know that P,_; is n-dimensional, since it has a 
basis {1,2,x?,...,2"~'} of n elements. Since {fi,..., fn} also has n 
elements, to show that it is a basis it will suffice by part (e) of Corollary 
1 to show that {f1,..., fn} is linearly independent. 


e Suppose for contradiction that {fi,...,f,} was linearly dependent. 
This means that there exists scalars aj,...,@,, not all zero, such that 
aif, + dofo+...+4nfn is the zero polynomial i.e. 


ay fi(x) + aofo(a) +...+Gnfn(xz) = 0 for all x. 
In particular, we have 


ai fi (21) + @2fe(t1) +... + Qnfn(21) = 0. 


52 


But since fi(#1) = 1 and fo(x1) =... = f,(a1) = 0, we thus have 
ay, X lta,gxO0O+4+...+a, x 0=0, 
i.e. ay = 0. A similar argument gives that az = 0, a3 = 0,... - contra- 


dicting the assumption that the a; were not all zero. Thus {f1,..., fn} 
is linearly independent, and is thus a basis by Corollary 1. 


From Lemma 4 we know that {f1,...,f/,} spans P,1. Thus every 
polynomial f € P,_; can be written in the form 


ha = ayfi Toxics Gf (0.4) 
for some scalars a,,...,@,. In particular, the interpolating polynomial 
between the data points (x1, y1),.--,(@n,Yn) must have this form. So 


to work out what the interpolating polynomial is, we just have to work 
out what the scalars a1,...,@p are. 


In order for f to be an interpolating polynomial, we need f(21) = y1, 
f(x2) = ye, ete. Let’s look at the first condition f(#1) = y,. Using 
(0.4), we have 


f (a1) = arfi(ti) +... + Gn fn(t1) = 1. 
But by arguing as in the lemma, we have 
ay fi(t1) +... + Gnfn(21) = a, X 1+ a x O+...+ a, x 0 = aj. 


Thus we must have aj = y;. More generally, we see that a2 = ys, 


a3 = y3,.--. Thus the only possible choice of interpolating polynomial 
is 
fisyfit...+¢nfn => lush (0.5) 
j=l 


which is the Lagrange interpolation formula. Conversely, it is easy 
to check that if we define f by the formula (0.5), then f(71) = y, 
f(x2) = Ye, etc. so f is indeed the unique interpolating polynomial 
between the data points (21, y1),..-, (3, Y3)- 


53 


e As an example, suppose one wants to interpolate a quadratic poly- 
nomial between the points (1,0), (2,2), and (3,1), so that x, := 1, 
To7= 2, G32= 3, Yist=— 0, Yo 2— 2, yz 2— 1, The formulae-for fi, fox fs 
are 


Pics NED) EES) (o—2)( = 3) 
(v1 —%2)(a1 -—23) (1—2)(1—-3) 

Pi un ee 3) _ (v—1)(#—-3) 
(Z_ — 21) (L2 — 23) (2 — 1)(2—3) 
a Ceres) ea Deo) 
(v3 —21)(@3-— 22) (3 -1)(3- 2) 


and so the interpolating polynomial is 


(x —1)(a—3) | (@—1)(@—-2) 
f= Of + 2fa+ th = 29 pay) + @=1y3=9) 


You can check by direct substitution that f(1) = 0, f(2) = 2, and 
f (3) = 1 as desired. After a lot of algebra one can simplify f to a more 
standard form 

f =—327/2+132/2—5 


e If one were to interpolate a single point (21, y1), one would just get the 
constant polynomial f = y,, which is of course the only polynomial of 
degree 0 which passes through (21, 1). 


e The Lagrange interpolation formula says that there is exactly one poly- 
nomial of degree at most n — 1 which passes through n given points. 
However, if one is willing to use more complicated polynomials (i.e. 
polynomials of degree higher than n — 1) then there are infinitely many 
more ways to interpolate those data points. For instance, take the 
points (0,0) and (1,1). There is only one linear polynomial which 
interpolates these points - the polynomial f(x) := x. But there are 
many quadratic polynomials which also interpolate these two points: 
f(x) = x? will work, as will f(x) = $x? + $2, or in fact any polynomial 
of the form (1 — @)x?+ 6x. And with cubic polynomials there are even 
more possibilities. The point is that each degree you add to the poly- 
nomial adds one more degree of freedom (remember that the dimension 
of P,(R) isn +1), and is it comes increasingly easier to satisfy a fixed 


54 


number of constraints (in this example there are only two constraints, 
one for each data point). This is part of a more general principle: when 
the number of degrees of freedom exceeds the number of constraints, 
then usually one has many solutions to a problem. When the number 
of constraints exceeds the number of degrees of freedom, one usually 
has no solutions to a problem. When the number of constraints exactly 
equals the number of degrees of freedom, one usually has exactly one 
solution to a problem. We will make this principle more precise later 
in this course. 


OK OK OK 


Linear transformations 


e Up until now we have studied each vector space in isolation, and looked 
at what one can do with the vectors in that vector space. However, 
this is only a very limited portion of linear algebra. To appreciate 
the full power of linear algebra, we have to not only understand each 
vector space individually, but also all the various linear transformations 
between one vector space and another. 


e A transformation from one set X to another set Y is just a function 
f : X — Y whose domain is X and whose range is in Y. The set 
of all possible transformations is extremely large. In linear algebra, 
however, we are not concerned with all types of transformations, but 
only a very special type known as linear transformations. These are 
transformations from one vector space to another which preserves the 
additive and scalar multiplicative structure: 


e Definition. Let X, Y be vector spaces. A linear transformation T 
from X to Y is any transformation T : X — Y which obeys the fol- 
lowing two properties: 


e (T preserves vector addition) For any z,2' € X, T(a#4+2') =Tx+Tr’. 


e (7 preserves scalar multiplication) For any x € X and any scalar c € R, 
Dice ele. 


e Note that there are now two types of vectors: vectors in X and vectors 
in Y. In some cases, X and Y will be the same space, but other times 


59 


they will not. So one should take a little care; for instance one cannot 
necessarily add a vector in X to a vector in Y. In the above definition, x 
and 2’ were vectors in X, so x+2’ used the X vector addition rule, but 
Tx and Tz’ were vectors in Y, so Tx+T'z’ used the Y vector addition 
rule. (An expression like « + Tx would not necessarily make sense, 
unless X and Y were equal, or at least contained inside a common 
vector space). 


The two properties of a linear transformation can be described as fol- 
lows: if you combine two inputs, then the outputs also combine (the 
whole is equal to the sum of its parts); and if you amplify an input by a 
constant, the output also amplifies by the same constant (another way 
of saying this is that the transformation is homogeneous). 


To test whether a transformation is linear, you have to check separately 
whether it is closed under vector addition, and closed under scalar 
multiplication. It is possible to combine the two checks into one: if 
you can check that for every scalar c € R and vectors 2,27’ € X, 
that T(cxr + 2’) = cT'x + Ta’, then you are automatically a linear 
transformation (See homework) 


Scalar multiplication as a linear transformation. A very simple 
example of a linear transformation is the map T’: R — R defined 
by Tx := 3x - it maps a scalar to three times that scalar. It is clear 
that this map preserves addition and multiplication. An example of a 
non-linear transformation is the map T : R > R defined by Tx := 2”. 


Dilations as a linear transformation As a variation of this theme, 
given any vector space V, the map 7’: V —> V given by Tx := 3x is 
a linear transformation (why?). This transformation takes vectors and 
dilates them by 3. 


The identity as a linear transformation A special case of dilations 
is the dilation by 1: Ix = x. This is a linear transformation from V to 
V, known as the identity transformation, and is usually called J or Iy. 


Zero as a linear transformation Another special case is dilation by 
0: Tx = 0. This is a linear transformation from V to V, called the zero 
transformation. 


56 


e Another example of a linear transformation is the map T : R? > R?® 
defined by 


1 2 
fs eee (ee ee ad Ws 
pO 


where we temporarily think of the vectors in R? and R® as column 
vectors. In other words, 


e Let’s check that T preserves vector addition. If x, x’ are two vectors in 


R?’, say 
Se on ee ee Lr 
toe be i 
then 
T(e+a')=T ( See ) 
2 i Lo 
(v1 + ©) + 2(z2 + 2) 
=| 3(4,+2))+4(re 4+ 24) 
5(v, + x) + 6(t2 + 25) 
while ; 
Tet aT ar 
L1 + 22x x, +225 
=| 37,+427. | + | 32,4424 
521 + 6x2 5a, + 62%, 


One can then see by inspection that T(x +2’) and Tx + Tv’ are equal. 
A similar computation shows that T(cr) = cT'x; we leave this as an 
exercise. 


e More generally, any m x n matrix (m rows and n columns) gives rise to 
a linear transformation from R” to R™. Later on, we shall see that the 
converse is true: every linear transformation from R” to R™ is given 


57 


by am X n matrix. For instance, the transformation T : R® > R® 
given by Tx := 5x corresponds to the matrix 


5 0 0 
0 5 0 
0 0 5 


(why?), while the identity transformation on R® corresponds to the 
identity matrix 


1 0 0 
0 1 0 
0 0 1 
(why?). (What matrix does the zero transformation correspond to?) 


Thus matrices provide a good example of linear transformations; but 
they are not the only type of linear transformation (just as row and 
column vectors are not the only type of vectors we study). We now 
give several more examples. 


Reflections as linear transformations Let R? be the plane, and let 
T : R? > R’ denote the operation of reflection through the x-axis: 


EP Gigts) = (x1, —£2). 


(Now we once again view vectors in R” as row vectors). It is straight- 
forward to verify that this is a linear transformation; indeed, it corre- 


sponds to the matrix 
1 0 
0 -1 


(why? - note we are confusing row and column vectors here. We will 
clear this confusion up later.). More generally, given any line in R? 
through the origin (or any plane in R? through the origin), the oper- 
ation of reflection through that line (resp. plane) is a linear transfor- 
mation from R? to R? (resp. R? to R?), as can be seen by elementary 
geometry. 


Rotations as linear transformations Let T : R? — R? denote the 
operation of rotation anticlockwise by 90 degrees. A little geometry 
shows that 

T (21,22) = (—2, 21). 


58 


This is a linear transformation, corresponding to the matrix 


0 -1 
(io): 

More generally, given any angle 6, the rotation anticlockwise or clock- 
wise around the origin gives rise to a linear transformation from R? 
to R?. In R®, it doesn’t quite make sense to rotate around the ori- 
gin (which way would it spin?), but given any line through the origin 
(called the azis of rotation), one can rotate around that line by an angle 
6 (though there are two ways one can do it, clockwise or anticlockwise). 
We will not cover rotation and reflection matrices in detail here - that’s 
a topic for 115B. 


Permutation as a linear transformation Let’s take a standard 
vector space, say R*, and consider the operation of switching the first 
and third components: 


T (24: X2,%3, oa) = (x3, 22,01, Ha): 


This is a linear transformation (why?) It corresponds to the matrix 


Coro GO 
Cor © 
coo oF 
eS OO O&O 


(why?). This type of operation - the rearranging of the co-ordinates - 
is known as a permutation, and the corresponding matrix is known as 
a permutation matriz. One property of permutation matrices is that 
every row and column contains exactly one 1, with the rest of the entries 
being 0. 


Differentiation as a linear transformation Here’s a more interest- 
ing transformation: Consider the transformation T: P,(R) > P,_1(R) 
defined by differentiation: 


ek 


is : 
I dx 


59 


Thus, for instance, if n = 3, then T would send the vector x° + 2x + 
4 € P3(R) to the vector 3x7 + 2 € P,(R). To show that T preserves 
vector addition, pick two polynomials f, g in R. We have to show that 
T(f+g) =Tf+T~q, ie. 


df. dg 
dx dx 


fu tg) = 


But this is just the sum rule for differentiation. A similar argument 
shows that T preserves scalar multiplication. 


The right-shift as a linear transformation Recall that R® is the 
space of all sequences, e.g. R® contains 


(x1, U2,%3,%4,.- ) 
as a typical vector. Define the right-shift operator U : R° + R® by 
U(«1, V2,X23,U4,-. .) — (0, U1,%2,%3,U4,.- .) 


i.e. we shift all the entries right by one, and add a zero at the be- 
ginning. This is a linear transformation (why?). However, it cannot 
be represented by a matrix since R® is infinite dimensional (unless 
you are willing to consider infinite-dimensional matrices, but that is 
another story). 


The left-shift as a linear transformation There is a companion 
operator to the right-shift, namely the left-shift operator U* : R° > 
R® defined by 


U* (a1, UQ,%3,U4,.. .) — (xo, X3,04,.- 5 


i.e. we shift all the entries left by one, with the x; entry disappear- 
ing entirely. It is almost, but not quite, the inverse of the right-shift 
operator; more on this later. 


Inclusion as a linear transformation Strictly speaking, the spaces 
R® and R? are not related: R? is not a subspace of R®, because two- 
dimensional vectors are not three-dimensional vectors. Nevertheless, 
we can “force” R? into R? by adding an extra zero on the end of each 


60 


two-dimensional vector. The formal way of doing this is introducing 
the linear transformation 1: R? + R®? defined by 


iio) 2 (in, Bo,.0). 


Thus R? is not directly contained in R?, but we can make a linear trans- 
formation which embeds R? into R? anyway via the transformation 1, 
which is often called an “inclusion” or “embedding” transformation. 
The transformation v corresponds to the matrix 


1 0 
0 1 
0 0 


Projection as a linear transformation Conversely, we can squish a 
three-dimensional vector into a two-dimensional one by leaving out the 
third component. More precisely, we may consider the linear transfor- 
mation 7: R® — R? defined by 


Ti Bite, Be) = (GPs) 


This is a linear transformation (why?). It is almost, but not quite, the 
inverse of 1; more on this later. 


Conversions as a linear transformation Linear transformations 
arise naturally when converting one type of unit to another. A simple 
example is, say, converting yards to feet: x yards becomes 3x feet, thus 
demonstrating the linear transformation Tx = 3x. A more sophisti- 
cated example comes from converting a number of atoms - let’s take 
hydrogen, carbon, and oxygen - to elementary particles (electrons, pro- 
tons, and neutrons). Let’s say that the vector (Nz, Nc, No) represents 
the number of hydrogen, carbon, and oxygen atoms in a compound, 
and (V., Np, Nn) represents the number of electrons, protons, and neu- 
trons. Since hydrogen consists of one proton and one electron, carbon 
consists of six protons, six neutrons, and six electrons, and oxygen con- 
sists of eight protons, eight neutrons, and eight electrons, the conversion 
formula is 

Ne. = Ny+6Nc+8No 

N, =Nx+6Nco+8No 

Ny, =6Nce+8No 


61 


or in other words 


is thus the conversion matriz from the hydrogen- 


SOO yea 
222 

I| 
Orr 


CO CO CO 


1 
The matrix | 1 
0 


carbon-oxygen vector space to the electron-proton-neutron vector space. 
(A philosophical question: why are conversions always linear?) 


Population growth as a linear transformation Linear transfor- 
mations are well adapted to handle the growth of heterogeneous pop- 
ulations - populations consisting of more than one type of species or 
creature. A basic example is that of Fibonacci’s rabbits. These are 
pairs of rabbits which reach maturity after one year, and then produce 
one pair of juvenile rabbits for every year after that. Thus, if at one 
year there are A pairs of juvenile rabbits and B pairs of adult rabbits, 
in the next year there will be B pairs of juvenile rabbits (because each 
pair of adult rabbits gives birth to a juvenile pair), and A+ B pairs 
of adult rabbits. Thus one can describe the passage of one year by a 
linear transformation: 


T(A, B) := (B, A+B). 


Thus, for instance, if in the first year there is one pair of juvenile rabbits, 
(1,0), in the next year the population vector will be T(1,0) = (0,1). 
Then in the year after that it will be 7(0,1) = (1,1). Then T(1, 1) = 
(1,2), then T(1,2) = (2,3), then T(2,3) = (3,5), and so forth. (We 
will return to this example and analyze it more carefully much later in 
this course). 


Electrical circuits as a linear transformation Many examples of 
analog electric circuits, such as amplifiers, capacitors and filters, can 
be thought of as linear transformations: they take in some input (ei- 
ther a voltage or a current) and return an output (also a voltage or a 
current). Often the input is not a scalar, but is a function of time (e.g. 
for AC circuits), and similarly for the output. Thus a circuit can be 


62 


viewed as a transformation from F(R, R) (which represents the input 
as a function or time) to F(R,R) (which represents the output as a 
function of time). Usually this transformation is linear, provided that 
your input is below a certain threshhold. (Too much current or voltage 
and your circuit might blow out or short-circuit - both very non-linear 
effects!). To actually write down what this transformation is mathe- 
matically, though, one usually has to solve a differential equation; this 
is important stuff, but is beyond the scope of this course. 


As you can see, linear transformations exist in all sorts of fields. (You 
may amuse yourself by finding examples of linear transformations in 
finance, physics, computer science, etc.) 


63 


Math 115A - Week 3 
Textbook sections: 2.1-2.3 
Topics covered: 


Null spaces and nullity of linear transformations 
Range and rank of linear transformations 

The Dimension Theorem 

Linear transformations and bases 

Co-ordinate bases 

Matrix representation of linear transformations 


Sum, scalar multiplication, and composition of linear transformations 


1 OK OK OK 


Review of linear transformations 


e A linear transformation is any map IT’: V —+ W from one vector space 
V to another W such that T preserves vector addition (i.e. T(u+v’) = 
Tv+Tv' for all v,v' € V) and T preserves scalar multiplication (i.e. 
T (cv) = cTv for all scalars c and all v € V). 


A map which preserves vector addition is sometimes called additive; a 
map which preserves scalar multiplication is sometimes called homoge- 
neous. 


We gave several examples of linear transformations in the previous 
notes; here are a couple more. 


Sampling as a linear transformation Recall that F(R,R) is the 
space of all functions from R to R. This vector space might be used 
to represent, for instance, sound signals f(t). In practice, a measuring 
device cannot capture all the information in a signal (which contains 
an infinite amount of data); instead it only samples a finite amount, at 
some fixed times. For instance, a measuring device might only sample 


64 


f(t) for t = 1,2,3,4,5 (this would correspond to sampling at 1Hz for 
five seconds). This operation can be described by a linear transforma- 
tion S: F(R,R) > R°, defined by 


Sf = (F(1), F(2)s F(3)s F(4), (5); 


i.e. S transforms a signal f(t) into a five-dimensional vector, consisting 
of f sampled at five times. For instance, 


S(x*) = (1,4, 9, 16, 25) 
S(/z) = (v1, V2, V3, V4, V5) 


etc. (Why is this map linear?) 


One can similarly sample polynomial spaces. For instance, the map 
S : P2(R) > R? defined by 


Sf = (f(0), fA), f(2)) 
is linear. 


Interpolation as a linear transformation Interpolation can be 
viewed as the reverse of sampling. For instance, given three numbers 
Y1, Y2, ¥3, the Lagrange interpolation formula gives us a polynomial 
f € P,(R) such that f(0) =m, f(1) = ye, and f(2) = ys: 


(x — 0)(a — 1) 


= I(@=2) , @=Oe-2) , 
*@=0)@—1) 


2h) (= 9) 


One can view this as a linear transformation S : R? + P:(R) defined 
by S(y1, Y2, ¥3) = Ts €.g. 


)i@=2) | (w—0)(e — 2) (x — 0)(v — 1) 


$(3,4,7) =" ‘= 00 = 1) 


(Why is this linear?). This is the inverse of the transformation S 
defined in the previous paragraph - but more on that later. 


65 


e Linear combinations as a linear transformation Let V be a vector 
space, and let v1,...,U, be a set of vectors in V. Then the transforma- 
tion T': R” — V defined by 


T(@1,.--, Qn) = Up +... + GnUn 


is a linear transformation (why?). Also, one can express many of the 
statements from previous notes in terms of this transformation 7. For 
instance, span({vi,...,Vn}) is the same thing as the image T(R”) of 
T; thus {v1,...,Un} spans V if and only if T is onto. On the other 
hand, T is one-to-one if and only if {v1,..., Un} is linearly independent 
(more on this later). Thus T’ is a bijection if and only if {v,..., un} is 
a basis. 


* OK OK OK 


Null spaces and nullity 


e A note on notation: in this week’s notes, we shall often be dealing with 
two different vector spaces V and W, so we have two different types of 
vectors. We will try to reserve the letter v to denote vectors in V, and 
w to denote vectors in W, in what follows. 


e Not all linear transformations are alike; for instance, the zero trans- 
formation T : V — W defined by T'v := 0 behaves rather differently 
from, say, the identity transformation T’': V — V defined by Tv := v. 
Now we introduce some characteristics of linear transformations to start 
telling them apart. 


e Definition Let 7: V — W bea linear transformation. The null space 
of T, called N(T), is defined to be the set 


N(T) :={v EV: Tv =O}. 


e In other words, the null space consists of all the stuff that 7’ sends 
to zero (this is the zero vector Ow of W, not the zero vector Oy of 
V): N(T) = T~1({0}). Some examples: if T : V — W is the zero 
transformation Tv := 0, then the null space N(T) = V. If instead 
T :V —V is the identity transformation Tv := v, then N(T) = {0}. 
If T : R? > R is the linear transformation T(z, y,z) =2+y +2, then 
N(T) is the plane {(z,y,z) € R?: e+ y+2 =O}. 


66 


The null space of T' is sometimes also called the kernel of T’, and is some- 
times denoted ker(7’); but we will use the notation N(7’) throughout 
this course. 


The null space N(T) is always a subspace of V; this is an exercise. 
Intuitively, the larger the null space, the more JT’ resembles the 0 trans- 
formation. The null space also measures the extent to which T' fails to 
be one-to-one: 


Lemma 1. Let T:V — W bea linear transformation. Then T is 
one-to-one if and only if N(T) = {0}. 


Proof. First suppose that 7 is one-to-one; we have to show that 
N(T) = {0}. First of all, it is clear that 0 € N(T), because TO = 0. 
Now we show that no other element is in N(T’). Suppose for contra- 
diction that there was a non-zero vector v € V such that v € N(T), 
ie. that Tv = 0. Then Tv = T0. But T is one-to-one, so this forces 
v = 0, contradiction. 


Now suppose that N(T’) = {0}; we have to show that T is one-to-one. 
In other words, we need to show that whenever Tv = Tv’, then we 
must have v = v’. So suppose that Tv = Tv’. Then Tv — Tv’ = 0, so 
that T(v — v') = 0. Thus v — v’ € N(T), which means by hypothesis 
that v — vu’ = 0, so v = v’, as desired. 


Example: Take the transformation T': R” > V defined by 


T(@1,.--,@n) = Vy +... + OnUn 
which we discussed earlier. If {v1,...,Un,} is linearly dependent, then 
there is a non-zero n-tuple (a1,...,@,) such that 0 = avy +... + GpUn; 


i.e. N(T) will consist of more than just the 0 vector. Conversely, if 
N(T) F {0}, then {v1,..., Un} is linearly dependent. Thus by Lemma 
1, T is injective if and only if {v1,...,v,} is linearly independent. 


Since N(T) is a vector space, it has a dimension. We define the nullity 
of T to be the dimension of N(T); this may be infinite, if N(T) is 
infinite dimensional. The nullity of 7 will be denoted nullity(T), thus 
nullity(7) = dim(V(T)). 


67 


Example: let 7: R° — R° be the operator 
W(X1, £2, U3, £4, £5) = (21, Lo, L3, 0,0) 
(Why is this linear?). Then 
N(a) = {(0,0,0, v4, 25) : 24,05 € R} 


(why?); this is a two-dimensional space (it has a basis consisting of 
(0,0,0,1,0) and (0,0,0,0,1) and so nullity(z) = 2. 


Example: By Lemma 1, a transformation is injective if and only if it 
has a nullity of 0. 


The nullity of T measures how much information (or degrees of free- 
dom) is lost when applying T. For instance, in the above projection, 
two degrees of freedom are lost: the freedom to vary the x4 and x5 co- 
ordinates are lost after applying 7. An injective transformation does 
not lose any information (if you know Tv, then you can reconstruct v). 


OK OK OK 


Range and rank 


You may have noticed that many concepts in this field seem to come 
in complementary pairs: spanning set versus linearly independent set, 
one-to-one versus onto, etc. Another such pair is null space and range, 
or nullity and rank. 


Definition The range R(T) of a linear transformation T : V > W is 
defined to be the set 


R(T) :={Tv:veV}. 
In other words, R(T) is all the stuff that T maps into: R(T) =T(V). 
(Unfortunately, the space W is also sometimes called the range of T; 


to avoid confusion we will try to refer to W instead as the target space 
for T; V is the initial space or domain of T.) 


Just as the null space N(T’) is always a subspace of V, it can be shown 
that R(T) is a subspace of W (this is part of an exercise). 


68 


e Examples: If 7 : V — W is the zero transformation Tv := 0, then 
R(T) = {0}. If 7: V > V is the identity transformation Tv := v, 
then R(T) = V. If 7: R” > V is the transformation 


T(@1,.--,@n) = Up +... + GnUn 
discussed earlier, then R(T) = span({vi,...,Vn}). 
e Example: A map T': V > W is onto if and only if R(T) = W. 


e Definition The rank rank(T) of a linear transformation T : V — W 
is defined to be the dimension of R(T), thus rank(T) = dim(R(T)). 


e Examples: The zero transformation has rank 0 (and indeed these are 
the only transformations with rank 0). The transformation 


(Gates Ca, Caos) = ey wore U0) 
defined earlier has range 
R(r) = {(21, £2, 23, 0,0) : v1, 72,23 € R} 
(why?), and so has rank 3. 


e The rank measures how much information (or degrees of freedom) is 
retained by the transformation JT. For instance, with the example of 
m above, even though two degrees of freedom have been lost, three 
degrees of freedom remain. 


* OK KK OK 
The dimension theorem 
e Let 7: V > W bealinear transformation. Intuitively, nullity(7) mea- 
sures how many degrees of freedom are lost when applying 7; rank(T) 
measures how many degrees of freedom are retained. Since the ini- 


tial space V originally has dim(V) degrees of freedom, the following 
theorem should not be too surprising. 


e Dimension Theorem Let V be a finite-dimensional space, and let 
T:V —W bea linear transformation. Then 


nullity(T’) + rank(T) = dim(V). 


69 


e The proof here will involve a lot of shuttling back and forth between 
V and W using T; and is an instructive example as to how to analyze 
linear transformations. 


e Proof. By hypothesis, dim(V) is finite; let’s define n := dim(V). Since 
N(T) is a subspace of V, it must also be finite-dimensional; let’s call 
k := dim(N(T)) = nullity(7). Then we have 0 < k <n. Our task is to 
show that k+rank(T’) =n, or in other words that dim(R(T)) = n—k. 


e By definition of dimension, the space N(T’) must have a basis {v1,..., uj} 
of k elements. (Probably it has many such bases, but we just need one 
such for this argument). This set of k elements lies in N(T’), and thus 
in V, and is linearly independent; thus by part (f) of Corollary 1 of last 
week’s notes, it must be part of a basis of V, which must then have 
n = dim(V) elements (by part (c) of Corollary 1). Thus we may add 


n—k extra elements vz41,...,Un to our N(T)-basis to form an V-basis 
{v1, ee ats 
e Since vg41,---, Un lie in V, the elements Tv,41,..., Tv, liein R(T). We 


now claim that {Tvp41,...,7'Un} are a basis for R(T); this will imply 
that R(T) has dimension n — k, as desired. 


e To verify that {Tvp41,...,2v,} form a basis, we must show that they 
span R(T) and that they are linearly independent. First let’s show 
they span R(T). This means that every vector in R(T) is a linear 
combination of Tvuz41,..., 7Upn. So let’s pick a typical vector w in R(T); 
our job is to show that w is a linear combination of Tvz41,..., 2v,. By 
definition of R(T’), w must equal Tv for some v in V. 


e On the other hand, we know that {v1,...,vn,} spans V, thus we must 
have 
V=4,Uy+...+ AnUn 


for some scalars a1,...,@,. Applying TJ to both sides and using the 
fact that T is linear, we obtain 


Tv =aqTvy +... + GnT Un. 


70 


e Now we use the fact that v1,..., vx liein N(T), soTv; =... = Tu, = 0. 
Thus 


Tv = Gps Pps +... + AnD Un. 


Thus w = Tv is a linear combination of Tvup41,...,7Un, as dsired. 


Now we show that {Tvpii,..., Un} is linearly independent. Suppose 
for contradiction that this set was linearly dependent, thus 


April Vp 41 + ance + An T Vn = 0 


for some scalars a@z41,..-,@, Which were not all zero. Then by the 
linearity of JT’ again, we have 


T (Qg41Ug41 +--+ GnUn) = 0 
and thus by definition of null space 

Ak41Ukt1 +... + AnUn € N(T). 
Since N(T) is spanned by {v,...,vx¢}, we thus have 


Akt1Uk41 +... + AnUn = AV, +... AVE 


for some scalars a,,...,@%. We can rearrange this as 

QU] — 2.2. — GeUK + Aes Ue +... + ann = 0. 
But the set {v1,...,Un} is linearly independent, which means that all 
the a’s must then be zero. But that contradicts our hypothesis that 
not all of the ay41,...,@, were zero. Thus {Tvz41,...,7Un} must have 


been linearly independent, and we are done. 


Example Let T : R? > R? denote the linear transformation 
T(x, y) = (@ + y, 2x + 2y). 
The null space of this transformation is 


N(T) = {(2z,y) € R?: e+ y=0} 


71 


(why?); this is a line in R?, and thus has dimension 1 (for instance, it 
has {(1,—1)} as a basis). The range of this transformation is 


R(T) = {(t,2t):t € R} 


(why?); this is another line in R? and has dimension 1. Since 1+1=2, 
the Dimension Theorem is verified in this case. 


Example For the zero transformation Tx := 0, we have nullity(T) = 
dim(X) and rank(T’) = 0 (so all the degrees of freedom are lost); while 
for the identity transformation Tx := x we have nullity(T) = 0 and 
rank(T’) = dim(X) (so all the degrees of freedom are retained). In both 
cases we see that the Dimension Theorem is verified. 


One important use of the Dimension Theorem is that it allows us to 
discover facts about the range of T’ just from knowing the null space of 
T, and vice versa. For instance: 


Example Let T : P5(R) — P,(R) denote the differentiation map 
Oh ee ie 
thus for instance T(#?+22x) = 327+2. The null space of T consists of all 
polynomials f in P;(R) for which f’ = 0; i.e. the constant polynomials 
N(T) = {c:c€ R} = Po(R). 


Thus N(7) has dimension 1 (it has {1} as a basis). Since P;(R) has 
dimension 6, we thus see from the dimension theorem that R(T’) must 
have dimension 5. But R(T’) is a subspace of P,(R), and P,(R) has 
dimension 5. Thus R(Z’) must equal all of P,(R). In other words, every 
polynomial of degree at most 4 is the derivative of some polynomial of 
degree at most 5. (This is of course easy to check by integration, but 
the amazing fact was that we could deduce this fact purely from linear 
algebra - using only a very small amount of calculus). 


Here is another example: 


Lemma 2 Let V and W be finite-dimensional vector spaces of the 
same dimension (dim(V) = dim(W)), and let 7: V > W be a linear 
transformation from V to W. Then T’ is one-to-one if and only if T’ is 
onto. 


72 


e Proof If T is one-to-one, then nullity(7’) = 0, which by the dimension 
theorem implies that rank(7’) = dim(V). Since dim(V) = dim(W), we 
thus have dim R(T) = dim(W). But R(T) is a subspace of W, thus 
R(T) = W, ie. T is onto. The reverse implication then follows by 
reversing the above steps (we leave as an exercise to verify that all the 
steps are indeed reversible). 


e Exercise: re-interpret Corollary 1(de) from last week’s notes using this 
Lemma, and the linear transformation 


T(Q1,.--, Qn) = Up +... + GnUn 
discussed earlier. 
Linear transformations and bases 


e Let T: V > W be a linear transformation, and let {v1,...,un} be 
a collection of vectors in V. Then {Tv,,...,T un} is a collection of 
vectors in W. We now study how similar these two collections are; for 
instance, if one is a basis, does this mean the other one is also a basis? 


e Theorem 3 If 7: V > W isa linear transformation, and {v1,..., Un} 
spans V, then {Tv1,...,Tv,} spans R(T). 


e Proof. Let w be any vector in R(T); our job is to show that w is a 
linear combination of Tv,,..., 7 vn. But by definition of R(T), w = Tv 
for some v € V. Since {v1,...,Un} spans V, we thus have v = ayv; + 
... + @nUy for some scalars aj,...,@,. Applying T to both sides, we 
obtain Tv = ayTv, +... + a,Tv,. Thus we can write w = Tv asa 
linear combination of T'v,,..., Zu, as desired. 


e Theorem 4 If 7: V > W is a linear transformation which is one-to- 
one, and {v,...,Un} is linearly independent, then {Tv,...,T un} is 
also linearly independent. 


e Proof Suppose we can write 0 as a linear combination of {Tv1,..., Tun}: 


0= al vy oa a eo AnL Un. 


73 


Our job is to show that the a,,...,a@, must all be zero. Using the 
linearity of 7’, we obtain 


0= Tau, +... + @ntn). 
Since T is one-to-one, N(T) = {0}, and thus 
O = ayvy +... FAnUn. 


But since {v1,..., Un} is linearly independent, this means that a1,..., dn 
are all zero, as desired. 


e Corollary 5 IfT : V > W is both one-to-one and onto, and {v1,..., Un} 
is a basis for V, then {Tv,..., Tun} is a basis for W. (In particular, 
we see that dim(V) = dim(W)). 


e Proof Since {v),...,Un,} is a basis for V, it spans V; and hence, by 
Theorem 3, {Tv1,..., vn} spans R(T). But R(T) = W since T is onto. 
Next, since {v1,..., Un} is linearly independent and T is one-to-one, we 
see from Theorem 4 that {Tv,,...,7v,} is also linearly independent. 


Combining these facts we see that {Tv,,...,7v,} is a basis for W. 


e The converse is also true: ifT : V — W is one-to-one, and {T,..., Tun} 
is a basis, then {v1,...,Un} is also a basis; we leave this as an exercise 
(it’s very similar to the previous arguments). 


e Example The map T : P3(R) — R* defined by 
T(ax® + bx? + cx + d) := (a,b,c, d) 


is both one-to-one and onto (why?), and is also linear (why?). Thus we 
can convert every basis of P3(R) to a basis of R* and vice versa. For 
instance, the standard basis {1, x, x, 2°} of P3(R) can be converted to 
the basis {(0,0,0, 1), (0,0, 1,0), (0, 1, 0,0), (1,0, 0,0)} of R4. In princi- 
ple, this allows one to convert many problems about the vector space 
P3(R) into one about R%, or vice versa. (The formal way of saying this 
is that P3(R) and R* are isomorphic; more about this later). 


1 OK OK OK 


Using a basis to specify a linear transformation 


74 


e In this section we discuss one of the fundamental reasons why bases 
are important; one can use them to describe linear transformations in 
a compact way. 


e In general, to specify a function f : X — Y, one needs to describe the 
value f(x) for every point x in X; for instance, if f : {1,2,3,4,5} > 
R, then one needs to specify f(1), f(2), f(3), f(4), f(5) in order to 
completely describe the function. Thus, when X gets large, the amount 
of data needed to specify a function can get quite large; for instance, 
to specify a function f : R? — R®, one needs to specify a vector 
f(x) € R® for every single point x in R? - and there are infinitely 
many such points! The remarkable thing, though, is that if f is linear, 
then one does not need to specify f at every single point - one just 
needs to specify f on a basis and this will determine the rest of the 
function. 


e Theorem 6 Let V be a finite-dimensional vector space, and let {v1,..., Un} 
be a basis for V. Let W be another vector space, and let w 1,..., Wn be 
some vectors in W. Then there exists exactly one linear transformation 
LP: V => W such that Tv; = wy; foreach 7 = 1,2, ...,7% 


e Proof We need to show two things: firstly, that there exists a linear 
transformation T' with the desired properties, and secondly that there 
is at most one such transformation. 


e Let’s first show that there is at most one transformation. Suppose 
for contradiction that we had two different linear transformations T : 
V + W and U:V — W such that Tv; = w; and Uv; = w; for each 
j=1,...,n. Now take any vector v € V, and consider T'v and Uv. 


e Since {v1,...,Un} is a basis of V, we have a unique representation 
UV = AVy + G2QVg +... + AnUn 
where @j,...,@, are scalars. Thus, since T’ is linear 
Tv =a,lv, + aol ve +... + Anl Un, 


but since T'v; = w;, we have 


Tv = ayw, + dgWo +... + AnWn. 


Arguing similarly with U instead of T’, we have 


Uv = ayw, + deWo +... + GnWn. 


so in particular Tv = Uv for all vectors v. Thus T and U are exactly 
the same linear transformation, a contradiction. Thus there is at most 
one linear transformation. 


Now we need to show that there is at least one linear transformation T' 
for which Tv; = w;. To do this, we need to specify T'v for every vector 
uv € V, and then verify that T is linear. Well, guided by our previous 
arguments, we know how to find Tv: we first decompose v as a linear 
combination of v1,...,Un 


V=4,Uy +... + AnUn 
and then define Tv by the formula above: 
Tv := aywy +... + GnWn. 


This is a well-defined construction, since the scalars aj,...,@, are 
unique (see the Lemma on page 36 of week 1 notes). To check that 
Tv; = w;, note that 


p= 00) Pace OU; 4 pie Ute ane On, 
and thus by definition of T 
Tu; = 0w, +... + Owj-1 + lw; + Oye. +... + Own = wy; 
as desired. 


It remains to verify that T is linear; i.e. that T(v+v’) =Tv+Tv' and 
that T(cv) = cTv for all vectors v,v' € V and scalars c. 


We'll just verify that T(v + v') = Tv + Tv’, and leave T(cv) = cTv as 
an exercise. Fix any v,v’ € V. We can decompose 


V =A, +... $AnUn 


76 


and 
byt Sc by 


for some scalars @1,...,@n,61,..., bn. Thus, by definition of T, 
Tv = aw, +... + anWy, 


and 
Tu’ = bu, +... + ban 


and thus 
Tut+Tv' = (a, +6:)wy +... + (Qn + bn) Wn- 


On the other hand, adding our representations of v and v’ we have 


utu =(a, + by)U, +... + (An + bn) Un 


and thus by the definition of T’ again 


T(v+v’) = (a, +b) )wy t+... + (dn + bn) Wn 


and so T(v+v') =Tv+Tv’' as desired. The derivation of T(cv) = cTv 
is similar and is left as an exercise. This completes the construction of 
T and the verification of the desired properties. 


Example: We know that R? has {(1,0), (0,1)} as a basis. Thus, by 
Theorem 6, for any vector space W and any vectors w1, Ww. in W, there 
is exactly one linear transform T : R? > W such that T(1,0) = w, 
and T(0,1) = we. Indeed, this transformation is given by 


T(x, y) = rw, + yuo 


(why is this transformation linear, and why does it have the desired 
properties?). 


Example: Let 6 be an angle. Suppose we want to understand the 
operation Rotg : R? > R? of anti-clockwise rotation of R? by 6. From 
elementary geometry one can see that this is a linear transformation. 
By some elementary trigonometry we see that Rotg(1,0) = (cos, sin 0) 
and Rotg(0, 1) = (—sin@,cos@). Thus from the previous example, we 
see that 

Roto(x,y) = x(cos 0, sin @) + y(— sin 6, cos@). 


77 


OK OK OK 


Co-ordinate bases 


Of all the vector spaces, R” is the easiest to work with; every vector v 
consists of nothing more than n separate scalars - the n co-ordinates of 
the vector. Vectors from other vector spaces - polynomials, matrices, 
etc. - seem to be more complicated to work with. Fortunately, by us- 
ing co-ordinate bases, one can convert every (finite-dimensional) vector 
space into a space just like R”. 


Definition. Let V be a finite dimensional vector space. An ordered 
basis of V is an ordered sequence (v1,...,Un) of vectors in V such that 
the set {v1,...,Un} is a basis. 


Example The sequence ((1,0,0), (0,1,0), (0,0,1)) is an ordered basis 
of R?; the sequence ((0,1,0), (1,0,0),(0,0,1)) is a different ordered 
basis of R®. (Thus sequences are different from sets; rearranging the 
elements of a set does not affect the set). 


More generally, if we work in R”, and we let e; be the vector with j co- 
ordinate | and all other co-ordinates 0, then the sequence (€1, €2,.-., En) 
is an ordered basis of R”, and is known as the standard ordered basis 
for R". In a similar spirit, (1,2,277,...,2”) is known as the standard 
ordered basis for P,,(R). 


Ordered bases are also called co-ordinate bases; we shall often give 
bases names such as 3. The reason why we need ordered bases is so 
that we can refer to the first basis vector, second basis vector, etc. 
(In a set, which is unordered, one cannot refer to the first element, 
second element, etc. - they are all jumbled together and are just plain 
elements). 


Let 8 = (v1,...,Un) be an ordered basis for V, and let v be a vector in 
V. From the Lemma on page 36 of Week 1 notes, we know that v has 
a unique representation of the form 


V=ayVy +... +AnUn- 


78 


The scalars a,,...,@, will be referred to as the co-ordinates of v with 
respect to 3, and we define the co-ordinate vector of v relative to 6, 
denoted [v]’, by 


(In the textbook, [v]g is used instead of [v]*. I believe this is a mistake 
- there is a convention that superscripts should refer to column vectors 
and subscripts to row vectors - although this distinction is of course very 
minor. This convention becomes very useful in physics, especially when 
one begins to study tensors - a generalization of vectors and matrices 
- but for this course, please don’t worry too much about whether an 
index should be a subscript or superscript. ) 


Example Let’s work in R®, and let v := (3,4,5). If 6 is the standard 
ordered basis 3 := ((1,0,0), (0, 1,0), (0,0,1)), then 


since 
(3, 4,5) = 3(1, 0,0) + 4(0, 1,0) + 5(0, 0, 1). 


On the other hand, if we use the ordered basis 6’ := ((0, 1,0), (1, 0,0), (0,0, 1)), 
then 
wl? =| 3 
3) 


since 
(3, 4,5) = 4(0, 1,0) + 3(1, 0,0) + 5(0, 0, 1). 


If instead we use the basis 8” := ((3, 4,5), (0, 1,0), (0,0, 1)), then 


r}" = | 0 
0 


79 


since 


(3,4, 5) = 1(3,4,5) + 0(0, 1,0) + 0(0, 0, 1). 


For more general bases, one would probably have to do some Gaussian 
elimination to work out exactly what the co-ordinate vector is (similar 
to what we did in Week 1). 


Example Now let’s work in P,(R), and let f = 377 +47 +6. If 8 is 
the standard ordered basis 6 := (1,2, 27), then 


6 
[f!’?= | 4 
3 
since 
f=6x14+4x24+3.x 2’. 
Or using the reverse standard ordered basis 6’ := (x?,.x,1), we have 
3 
[f]? =| 4 
6 
since 
PSs +4 xe tox 1. 
6 3 
Note that while | 4 }| and | 4 J] areclearly different column vectors, 
3 6 


they both came from the same object f. It’s like how one person may 
perceive a pole as being 12 feet long and another may perceive it as 
being 4 yards long; both are correct, even though 12 is not equal to 
4. It’s just that one person is using feet as a basis for length and the 
other is using yards as a basis for length. (Units of measurement are 
to scalars as bases are to vectors. To be pedantic, the space V of all 
possible lengths is a one-dimensional vector space, and both (yard) 
and (foot) are bases. A length v might be equal to 4 yards, so that 
[v]'v7"4) = (4), while also being equal to 12 feet, so [v](f°* = 12). 


80 


e Given any vector v and any ordered basis @, we can construct the 
co-ordinate vector [v]’. Conversely, given the co-ordinate vector 


ay 
P=]: 
an 
and the ordered basis 6 = (v1,...,Un), one can reconstruct v by the 


formula 
V=a4Vy +... + AnUn- 


Thus, for any fixed basis, one can go back and forth between vectors v 
and column vectors [v]® without any difficulty. 


e Thus, the use of co-ordinate vectors gives us a way to represent any 
vector as a familiar column vector, provided that we supply a basis 
3. The above examples show that the choice of basis 6 is important; 
different bases give different co-ordinate vectors. 


e A philosophical point: This flexibility in choosing bases underlies a ba- 
sic fact about the standard Cartesian grid structure, with its x and y 
axes, etc: it is artificial! (though of course very convenient for com- 
putations). The plane R? is a very natural object, but our Cartesian 
grid is not (the ancient Greeks were working with the plane back in 
300 BC, but Descartes only introduced the grid in the 1700s). Why 
couldn’t we make, for instance, the x-axis point northwest and the y- 
axis point northeast? This would correspond to a different basis (for 
instance, using ((1,1),(1,—1)) instead of ((1,0), (0,1)) but one could 
still do all of geometry, calculus, etc. perfectly well with this grid. 


e (The way mathematicians describe this is: the plane is canonical, but 
the Cartesian co-ordinate system is non-canonical. Canonical means 
that there is a natural way to define this object uniquely, without 
recourse to any artificial convention.) 


e As we will see later, it does make sense every now and then to shift 
one’s co-ordinate system to suit the situation - for instance, the above 
basis ((1, 1), (1, —1)) might be useful in dealing with shapes which were 


81 


always at 45 degree angles to the horizontal (i.e. diamond-shaped ob- 
jects). But in the majority of cases, the standard basis suffices, if for 
no reason other than tradition. 


e The very operation of sending a vector v to its co-ordinate vector |v]? is 
itself a linear transformation, from V to R”: see this week’s homework. 


* OK OK OK 


The matrix representation of linear transformations 


e We have just seen that by using an ordered basis of V, we can represent 
vectors in V as column vectors. Now we show that by using an ordered 
basis of V and another ordered basis of W, we can represent linear 
transformations from V to W as matrices. This is a very fundamental 
observation in this course; it means that from now on, we can study 
linear transformations by focusing on matrices, which is exactly what 
we will be doing for the rest of this course. 


e Specifically, let V and W be finite-dimensional vector spaces, and let 
GB := (U1,---,;Un) and y = (w1,...,Wm) be ordered bases for V and W 
respectively; thus {v1,...,Un} is a basis for V and {wi,...,wm} is a 
basis for W, so that V is n-dimensional and W is m-dimensional. Let 
T be a linear transformation from V to W. 


e Example Let V = P3(R), W = P,(R), and T : V > W be the 
differentiation map Tf := f’. We use the standard ordered basis (@ := 
(1,2,x?,2*) for V, and the standard ordered basis y := (1,2, 27) for 
W. We shall continue with this example later. 


e Returning now to the general situation, let us take a vector v in V and 
try to compute TJ'v using our bases. Since v is in V, it has a co-ordinate 
representation 

vy 
[vy]? = 


In 


82 


with respect to 6. Similarly, since Tv is in W, it has a co-ordinate 
representation 
Y1 
Pes 
Ym 


with respect to y. Our question is now: how are the column vectors 
[v]® and [T'v]7 related? More precisely, if we know the column vector 
[v]®, can we work out what [T'v]7 will be? Of course, the answer will 
depend on T; but as we shall see, we can quantify this more precisely, by 
saying that the answer will depend on a certain matrix representation 
of T with respect to 6 and ¥. 


e Example Continuing our previous example, let’s pick a v € P3(R) at 
random, say v := 327+ 7x + 5, so that 


5 
7 
B 
[0] 3 
0 
Then we have Tv = 6x + 7, so that 
7 
[Py = 6 
0 
3) 
The question here is this: starting from the column vector : for 
0 
iG 
v, how does one work out the column vector | 6 | for Tv? 
0 


e Return now to the general case. From our formula for |[v|]z, we have 


V=X1V, + LQVo +... + LV, 


83 


so if we apply T’ to both sides we obtain 
Ty = 211, + vol veg +... tant Un. (0.6) 


while from our formula for [Tv]? we have 


Tv = yyw, + yowo +... + YmWm- (0.7) 
Now to connect the two formulae. The vectors Tv,,...,2Zvp lie in W, 
and so they are linear combinations of w1,..., Wm: 
Tuy = Q411W1 + A21W2 es Am1Wm 
Tv = 442W + A22W2 fate Am2Wm 
TVn = Ayn Wy + Aon We +... + AmnWm;} 
note that the numbers aj1,...,@nm are scalars that only depend on T’, 


6, and y (the vector v is only relevant for computing the x’s and y’s). 


Substituting the above formulae into (0.6) we obtain 


Tv = 2(aywit...+amiwm) 
+29(a12W1 +... + Am2Wm) 


+2n(AtnW1 +... + GmnWm) 


Collecting coefficients and comparing this with (0.7) (remembering that 


{w1,..-,Wm} is a basis, so there is only one way to write Tv as a linear 
combination of w1,...,Wm) - we obtain 

Yr = AX] + A1Q%Q 1... TF AinTn, 

YQ = AX] + 92%] 1... + AanXn 

Ym = Ami 1 + Am2%Q +... + GAmn2n.- 


This may look like a mess, but it becomes cleaner in matrix form: 


y Qj1 412+ An #f 

1 1 

. a21 a292 -+» Qan : 

Ym In 
Qm1 QAm2 --- Amn 


Thus, if we define [T]} to be the matrix 


a1, 412 Qin 
a a a 
IT] Y 21 22 Qn 
B 
Qm1 Gm2 +++» Amn 


then we have answered our question of how to link [vu]? with [Tv}7: 
[Tv]” = (Tlalel?. 


(If you like, the @ subscript on the T has “cancelled” the @ superscript 
on the v. This is part of a more general rule, known as the Einstein 
summation convention, which you might encounter in advanced physics 
courses when dealing with things called tensors). 


It is no co-incidence that matrices were so conveniently suitable for this 
problem; in fact matrices were initially invented for the express purpose 
of understanding linear transformations in co-ordinates. 


Example. We return to our previous example. Note 


Tv Sl = 0 = Ow, Ow) Ows 


Tv =e l= lw, Ow Ow 3 


Tv3 = Tx? = 2x = Ow, + 2wo + 0w3 


Tu, = Tx? = 327 = Ow, + Ow. + 33 


and hence 
0 10 0 
(T]IZ:={ 0 0 2 0 
00 0 3 
Thus [v]® and [Tv] are linked by the equation 


01 
[Ty =| 0 0 
0 0 


thus for instance returning to our previous example 


CoWN Ot 


The matrix [T’ 3 is called the matrix representation of T with respect 
to the bases 8 and y. Notice that the j“ column of [T]} is just the 
co-ordinate vector of Tv; with respect to ¥: 


[Tl = (Poi? [Poo}? ... [Pen]”) 
(see for instance the previous Example). 


In many cases, V will be equal to W, and { equal to ¥; in that case we 
may abbreviate [T]} as [T],. 


Just like a vector v can be reconstructed from its co-ordinate vector 
[v]® and vice versa (provided one knows what 3 is, of course), a linear 
transformation T can be reconstructed from its co-ordinate matrix [T]} 
and vice versa (provided 8 and ¥ are given). Indeed, if one knows [T lB 
then one can work out the rule to get from v to Tv as follows: first 
write v in terms of 6, obtaining the co-ordinate vector [v|?; multiply 
this column vector by [T]3 to obtain [T’v]7, and then use y to convert 


this back into the vector T'v. 


The scalar case All this stuff may seem very abstract and foreign, but 
it is just the vector equivalent of something you are already familiar 
with in the scalar case: conversion of units. Let’s give an example. 
Suppose a car is travelling in a straight line at a steady speed T' for 
a period v of time (yes, the letters are strange, but this is deliberate). 
Then the distance that this car traverses is of course T'v. Easy enough, 
but now let’s do everything with units. 


Let’s say that the period of time v was half an hour, or thirty minutes. 
It is not quite accurate to say that v = 1/2 or v = 30; the precise 
statement (in our notation) is that [vu] = (1/2), or [v]ore = 
(30). (Note that (hour) and (minute) are both ordered bases for time, 


86 


which is a one-dimensional vector space). Since our bases just have one 
element, our column vector has only one row, which makes it a rather 
silly vector in our case. 


Now suppose that the speed T’ was twenty miles an hour. Again, it is 
not quite accurate to say that 7’ = 20; the correct statement is that 


mile 
[T] Mii = (20) 


since we clearly have 
T(1 x hour) = 20 x mile. 
We can also represent 7’ in other units: 


[Tl Griese) = (1/3) 


(minute) ~~ 
kilometer 
[T] ane = (32) 
etc. In this case our “matrices” are simply 1 x 1 matrices - pretty 
boring! 


Now we can work out J'v in miles or kilometers: 


Poles = nln = 20) 72) 0) 


or to do things another way 


Pole ree 3 el ne) = (173)G0) = (10). 


(minute 


Thus the car travels for 10 miles - which was of course obvious from the 
problem. The point here is that these strange matrices and bases are 
not alien objects - they are simply the vector versions of things that 
you have seen even back in elementary school mathematics. 


A matrix example Remember the car company example from Week 
1? Let’s run an example similar to that. Suppose the car company 
needs money and labor to make cars. To keep things very simple, let’s 
suppose that the car company only makes exteriors - doors and wheels. 
Let’s say that there are two types of cars: coupes, which have two doors 
and four wheels, and sedans, which have four doors and four wheels. 
Let’s say that a wheel requires 2 units of money and 3 units of labor, 
while a door requires 4 units of money and 5 units of labor. 


87 


e We’re going to have two vector spaces. The first vector space, V, is the 
space of orders - the car company may have an order v of 2 coupes and 
3 sedans, which translates to 16 doors and 20 wheels. Thus 


2 
(coupe,sedan) __ 
5 (3) 


16 
(door,wheel) __ : 
te ( 20 ) 


both (coupe, sedan) and (door, wheel) are ordered bases for V. (One 
could also make other bases, such as (coupe, wheel), although those are 
rather strange). 


and 


e The second vector space, W, is the space of resources - in this case, 
just money and labor. We’re only going to use one ordered basis here: 
(money, labor). 


e There is an obvious linear transformation T from V to W - the cost 
(actually, price is a more accurate name for T; cost should really refer 
to Tv). Thus, for any order v in V, Tv is the amount of resources 
required to create v. By our hypotheses, 


T (door) = 4 x money + 5 x labor 


and 
T (wheel) = 2 x money + 3 x labor 


(money,labor) __ A 2 
Olan ea) a ( 5 3 ) : 
You may also check that 
(money,labor) __ 16 24 
EO cote Geder) = ( 92 32 ; 
Thus, for our order v, the cost to make v can be computed as 


(money,labor) __ (money,labor) | )(door,wheel) _ 42 nO = 1s 
[Tv] as Ll aooestheet) [2] ce ( 5 3 ) ( 20 a 140 


88 


sO 


or equivalently as 


(money,labor) __ (money,labor) ; )(coupe,sedan) _ 16 24 2 = 
[Tv] [T] (coupe,sedan) [2] ( 22 32 3 


i.e. one needs 104 units of money and 140 units of labor to com- 
plete the order. (Can you explain why these two apparently distinct 
computations gave exactly the same answer, and why this answer is 
actually the correct cost of this order?). Note how the different bases 
(coupe, sedan) and (door, wheel) have different advantages and disad- 
vantages; the (coupe, sedan) basis makes the co-ordinate vector for v 
nice and simple, while the (door, wheel) basis makes the co-ordinate 
matrix for T’ nice and simple. 


1 OK OK OK 


Things to do with linear transformations 


We know that certain operations can be performed on vectors; they can 
be added together, or multiplied with a scalar. Now we will observe 
that there are similar operations on linear transformations, they can 
also be added together and multiplied by a scalar, but also (under 
certain conditions) can also be multiplied with each other. 


Definition. Let V and W be vector spaces, and let S: V > W and 
T :V —W be two linear transformations from V to W. We define 


the sum S + T of these transformations to be a third transformation 
S+T:V —- W, defined by 


(S+T)(v) := Su+Tv. 


Example. Let S : R? > R? be the doubling transformation, defined 
by Su := 2v. Let T : R? > R’ be the identity transformation, defined 
by Tv :=v. Then S + T is the tripling transformation 


(S+T)v = Su+Tv =2u+v =3v. 


Lemma 7 The sum of two linear transformations is again a linear 
transformation. 


89 


104 
140 


)- 


Proof Let S:V > W,T: V — W be linear transformations. We 
need to show that S +7: V — W is also linear; i.e. it preserves 
addition and preserves scalar multiplication. Let’s just show that it 
preserves scalar multiplication, i.e. for any v € V and scalar c, we have 
to show that 

(S+T)(cv) =c(S+T)v. 


But the left-hand side, by definition, is 
S(cv) + T(cv) = cSu+cTv 
since S', T’ are linear. Similarly, the right-hand side is 
c(Su + Tv) =cSuv+cTv 


by the axioms of vector spaces. Thus the two are equal. The proof 
that S + T preserves addition is similar and is left as an exercise. 


Note that we can only add two linear transformations S, T’ if they have 
the same domain and target space; for instance it is not permitted to 
add the identity transformation on R? to the identity transformation 
on R®. This is similar to how vectors can only be added if they belong 
to the same space; a vector in R? cannot be added to a vector in R?. 


Definition. Let T.: V > W be a linear transformation, and let c be 
a scalar. We define the scalar multiplication cT of c and T to be the 
transformation cI’: V — W, defined by 


(cT)(v) = c(Tv). 
It is easy to verify that cT is also a linear transformation; we leave this 
as an exercise. 


Example Let S : R? > R? be the doubling transformation, defined 
by Su := 2v. Then 25 : R? — R? is the quadrupling transformation, 
defined by 2Sv := 4v. 


Definition Let £(V,W) be the space of linear transformations from V 
to W. 


90 


Example In the examples above, the transformations S, 7’, S+7', and 
25 all belonged to £(R?, R?). 


Lemma 8 The space L(V, W) is a subspace of F(V,W), the space of 
all functions from V to W. In particular, £(V,W) is a vector space. 


Proof Clearly £(V, W) is a subset of F(V, W), since every linear trans- 
formation is a transformation. Also, we have seen that the space 
L(V,W) of linear transformations from V to W is closed under addition 
and scalar multiplication. Hence, it is a subspace of the vector space 
F(V,W), and is hence itself a vector space. (Alternatively, one could 
verify each of the vector space axioms (I-VIII) in turn for L(V, W); this 
is a tedious but not very difficult exercise). 


The next basic operation is that of multiplying or composing two linear 
transformations. 


Definition Let U, V, W be vector spaces. Let S :V > W bea 
linear transformation from V to W, and let T : U > V be a linear 
transformation from U to V. Then we define the product or composition 
ST :U — W to be the transformation 


Sit) i=5 Pu): 

Example Let U : R® > R® be the right shift operator 

CPE occa) Oa pig od, Is 
Then the operator UU = U? is given by 
Oi. asd = OU ey gs) 0 ig Sa OO ee ay eee 
i.e. the double right shift. 
Example Let U* : R® — R® be the left-shift operator 

Ga lee cg arene ee Cis ee 
Then U*U is the identity map: 


U*U (a1, £2, a ») = U* (0, £1, Xa, az .) = (X1, Le, ae .) 


Oil 


but UU* is not: 
UU" (a9, oa és) = U(x, ee .) = (0, 22, a ah 
Thus multiplication of operators is not commutative. 


e Note that in order for ST to be defined, the target space of T' has 
to match the initial space of S. (This is very closely related to the 
fact that in order for matrix multiplication AB to be well defined, the 
number of columns of A must equal the number of rows of B). 


OK OK OK 


Addition and multiplication of matrices 


e We have just defined addition, scalar multiplication, and composition 
of linear transformations. On the other hand, we also know how to 
add, scalar multiply, and multiply matrices. Since linear transforma- 
tions can be represented (via bases) as matrices, it is thus a natural 
question as to whether the linear transform notions of addition, scalar 
multiplication, and composition are in fact compatible with the matrix 
notions of addition, scalar multiplication, and multiplication. This is 
indeed the case; we will now show this. 


e Lemma 9 Let V, W be finite-dimensional spaces with ordered bases £, 
y respectively. Let S: V > W and TIT’: V > W be linear transforma- 
tions from V to W, and let c be a scalar. Then 


[S+T]3 = [S]3 + (715 


and 
[cT]5 = clT]B. 
e Proof. We'll just prove the second statement, and leave the first as an 
exercise. Let’s write 6 = (v1,...,U,) and y = (w1,..., Wy), and denote 
the matrix [T]; by 


Qy1 aj42 -e.» An 
a a wa’ 8G 
(TT? = 21 22 2n 
B 
Qm1 Am2 --- Amn 


Thus 


Tv, = 441 W1 + A21W2 seats Am1 Wm 
Tv = 442W + A22W2 cae Am2Wm 
LUG. SOW Gag We oe Gan) 


Multiplying by c, we obtain 


(cT)v, = cayyw, + cagyWo +... + COmiWm 
(cT)vg = cayqw + cagqW2 +... + Cam2Wm 
(cT)Un = CA1nW1 + CAgqnW2 +... + COmnWm 
and thus 
Caj1 Cca42 -+s Cain 
ca ca cee 8) 
eT = 21 22 Qn 
B° : ’ 
Cami CAam2 .-.-- CAamn 


i.e. [cT]} = c[T]} as desired. 


e We'll leave the corresponding statement connecting composition of lin- 
ear transformations with matrix multiplication for next week’s notes. 


93 


Math 115A - Week 4 
Textbook sections: 2.3-2.4 
Topics covered: 


A quick review of matrices 

Co-ordinate matrices and composition 

Matrices as linear transformations 

Invertible linear transformations (isomorphisms) 
Isomorphic vector spaces 


* OK OK OK 


A quick review of matrices 


An m xX n matriz is a collection of mn scalars, organized into m rows 
and n columns: 


Au Aj tae Ain, 
A Aat Azo ' .. Aon 
Ami Amo see Avi 


If A is a matrix, then Aj; refers to the scalar entry in the j‘” row and 
k’ column. Thus if 


then Ay = 1, Aj = 2; Ao = 3, and Ao =A. 


(The word “matrix” is late Latin for “womb”; it is the same root as 
maternal or matrimony. The idea being that a matrix is a receptacle 
for holding numbers. Thus the title of the recent Hollywood movie “the 
Matrix” is a play on words). 


94 


e A special example of a matrix is the n x n identity matrix I,,, defined 


by 
10... 0 
0 1 0 
a 
0 0... 1 


or equivalently that (J;,);, := 1 when j = k and (I,,);, := 0 when j 4 k. 


e If A and B are two m x n matrices, the sum A+ B is another m x n 
matrix, defined by adding each component separately, for instance 


(A+ Byun = Ant Bu 
and more generally 
(A+ B) jn = Ajn + Bye. 
If A and B have different shapes, then A+ B is left undefined. 


e The scalar product cA of a scalar c and a matrix A is defined by mul- 
tiplying each component of the matrix by c: 


(cA) jx — CAR, 


e If Ais an m X n matrix, and B is an / x m matrix, then the matrix 
product BA is an | x n matrix, whose co-ordinates are given by the 
formula 


(BA) jx = By Ai + BjoAn t+... + BimAms = _ BA. 
qe) 
Au Aj 
A:= 
( Ao, Ago 
By Bip 
B= 
( Bo, Bog ) 


95 


Thus for instance if 


and 


then 


(BA) ~ ByAu T By2An1; (BA)i2 = By Ate may By2A22 


(BA)ai = By Au T Bo. Ani; (BA) 22 = By Ate T Bo, A22 


and so 
pac By Ay + ByAg By Ai + ByA2 
By Ay + BogAo Bo Aig + Boo Ace 


or in other words 


( By By ) ( Au Ap ) Zi ( ByAu T By2Ao By Ate T ByA22 


Boy By Aa Ago By Au T Bo Aq By Ate T Boy A22 


If the number of columns of B does not equal the number of rows of 
A, then BA is left undefined. Thus for instance it is possible for BA 
to be defined while AB remains undefined. 


e This matrix multiplication rule may seem strange, but we will explain 
why it is natural below. 


e It is an easy exercise to show that if A is an m x n matrix, then 
I,A = Aand AI, = A. Thus the matrices J, and [,, are multiplicative 
identities, assuming that the shapes of all the matrices are such that 
matrix multiplication is defined. 


* OK OK OK 


Co-ordinate matrices and composition 


e Last week, we introduced the notion of a linear transformation T' : 
X — Y. Given two linear transformations T: X > Y and S:Y > Z, 
where the target space of 7’ matches up with the initial space of S, 
their composition ST : X — Z, defined by 


ST(v) = S(Tv) 


is also a linear transformation; this is easy to check and I'll leave it as 
an exercise. Also, if Iy : X — X is the identity on X and ly: Y —~ Y 
is the identity on Y, it is easy to check that T/y = T and lyT =T. 


96 


e Example Suppose we are considering combinations of two molecules: 
methane C'H, and water H2O. Let X be the space of all linear com- 
binations of such molecules, thus X is a two-dimensional space with 
a := (methane, water) as an ordered basis. (A typical element of X 
might be 3 x methane + 2 x water). Let Y be the space of all lin- 
ear combinations of Hydrogen, Carbon, and Oxygen atoms; this is a 
three-dimensional space with 6 := (hydrogen, carbon, orygen) as an 
ordered basis. Let Z be the space of all linear combinations of elec- 
trons, protons, and neutrons, thus it is a three-dimensional space with 
7 := (electron, proton, neutron) as a basis. There is an obvious linear 
transformation T : X — Y, defined by starting with a collection of 
molecules and breaking them up into component atoms. Thus 


T (methane) = 4 x hydrogen + 1 x carbon 


T (water) = 2 x hydrogen + 1 x oxygen 


and so T has the matrix 


(methane,water) 


4 2 
hae 3 LD ae ease) _ i726 
0 1 


Similarly, there is an obvious linear transformation S : Y +> Z, de- 
fined by starting with a collection of atoms and breaking them up into 
component particles. Thus 


S(hydrogen) = 1 x electron + 1 x proton 


S(carbon) = 6 x electron + 6 x proton + 6 x neutron 


S(oxygen) = 8 x electron + 8 x proton + 8 x neutron. 


Thus 


Bo (hydrogen,carbon,oxygen) 


1 6 8 
(s]z = Sie ere) = 168 
0 6 8 


The composition ST’: X — Z of S and T is thus the transformation 
which sends molecules to their component particles. (Note that even 
though S' is to the left of 7, the operation T is applied first. This 


97 


rather unfortunate fact occurs because the conventions of mathematics 
place the operator T before the operand z, thus we have T(x) instead 
of (a)T. Since all the conventions are pretty much entrenched, there’s 
not much we can do about it). A brief calculation shows that 


ST (methane) = 10 x electron + 10 x proton + 6 x neutron 


ST (water) = 10 x electron + 10 x proton +8 x neutron 


and hence 
10 10 
electron,proton,neutron 
beta a [ST] ee a 10 10 
6 8 


Now we ask the following question: how are these matrices [T]%, [S]}, 
and [ST]? related? 


e Let’s consider the 10 entry on the top left of [ST]7. This number 
measures how many electrons there are in a methane molecule. From 
the matrix of [T] we see that each methane molecule has 4 hydrogen, 1 
carbon, and 0 oxygen atoms. Since hydrogen has 1 electron, carbon has 
6, and oxygen has 8, we see that the number of electrons in methane is 


4x1+1x6+0x8=10. 
Arguing similarly for the other entries of [ST]7, we see that 


4x1+1x6+0x8 2x1+0x641x8 
[ST]X = | 4x14+1x6+0x8 2x1+0x64+1x8 
4x04+1x6+0x8 2x0+0x64+1x8 


But this is just the matrix product of [S]} and [T]@: 
L..16: 3S 4 2 
[Sea | 1. 6-8 105.) = (S]p(Be. 
0 6 8 Oo. a 


e More generally, we have 


98 


e Theorem 1. Suppose that X is /-dimensional and has an ordered basis 
a = (u1,...,u), Y is m-dimensional and has an ordered basis 6 = 
(U1,-.-,;Um), and Z is n-dimensional and has a basis y of n elements. 
Let 7: X + Y and S: Y > Z be linear transformations. Then 


[ST]. = [S]pIT Io. 


e Proof. The transformation T has a co-ordinate matrix [T]2, which is 
an m Xx | matrix. If we write 


Q11 a42 see QAI 
iT}? 2 a21 a22 oe. Ag] 
Gm1 AUm2 +--+ Ami 
then we have 
Tuy = 411U1 21 U2 fees Qm1Um 
Tue = 4212U1 022VU2 Sia Am2Um 
Tuy = Qyvy + Ago +... + AmiUm 


We write this more compactly as 


m 
Li y Gg Up lOr tS els 
j=l 


e Similarly, S has a co-ordinate matrix [S]3, which is an n X m matrix. 


If 
big. “Bis. base Dig 
boi bog... bam 
SIR = } 
Daa Donor a “2D age 
then 


Sv; = So dyjwe fOr = Lae tie 


k=1 


99 


Now we try to understand how ST acts on the basis uj,..., uj. Ap- 
plying S to both sides of the T’ equations, and using the fact that S is 
linear, we obtain 


ST u; = S- gg Uy: 
j=l 


Applying our formula for Sv;, we obtain 


m 


ST u; = S- Aji 3 bei Wk 
k=1 


j=l 
which we can rearrange as 
nr m 
ST u; = ) ( ) by ji) We. 
k=1 j=l 


Thus if we define 


m 
C= y Opes — OprGag =e Opotag os FE Opti 


j=l 
then we have 
n 
STu; = y ChiWh 
k=1 
and hence 
C11 C12 wee Cl 
Co, C22 see CQ 
OY ewe 
evan ~~ a 
Cnl Cm2 .--- Cnl 


However, if we perform the matrix multiplication 


bi bio och bim Qi a42 see QA 
bo boo Se bom 21 a22 wee AQ 
Ont bine ears Oars QGm1 Am2 --- Aml 


we get exactly the same matrix (this is because of our formula for Cp; 
in terms of the b and a co-efficients). This proves the theorem. 


100 


e This theorem illustrates why matrix multiplication is defined in that 
strange way - multiplying rows against columns, etc. It also explains 
why we need the number of columns of the left matrix to equal the 
number of rows of the right matrix; this is like how to compose two 
transformations T: X + Y and S: Y > Z to form a transformation 
ST : X — Z, we need the target space of T’ to equal to the initial space 
of S. 


1K OK OK OK 


Comparison between linear transformations and matrices 


e To summarize what we have done so far: 


e Given a vector space X and an ordered basis a for X, one can write 
vectors v in V as column vectors [v|°. Given two vector spaces X, Y, 
and ordered bases a, @ for X and Y respectively, we can write linear 
transformations T : X — Y as matrices [T]?. The action of T then 
corresponds to matrix multiplication by [T]}: 


[Te]? = [T]alel*; 
i.e. we can “cancel” the basis a. Similarly, composition of two linear 
transformations corresponds to matrix multiplication: if S:Y > Z 
and 7¥ is an ordered basis for 7, then 


[ST]2 = [S]3IT 16 
i.e. we can “cancel” the basis (. 


e Thus, by using bases, one can understand the behavior of linear trans- 
formations in terms of matrix multiplication. This is not quite saying 
that linear transformations are the same as matrices, for two reasons: 
firstly, this correspondence only works for finite dimensional spaces X, 
Y, Z; and secondly, the matrix you get depends on the basis you choose 
- a single linear transformation can correspond to many different ma- 
trices, depending on what bases one picks. 


e To clarify the relationship between linear transformations and matrices 
let us once again turn to the scalar case, and now consider currency 


101 


conversions. Let X be the space of US currency - this is the one- 
dimensional space which has (dollar) as an (ordered) basis; (cent) is 
also a basis. Let Y be the space of British currency (with (pound) or 
(penny) as a basis; pound = 100 x penny), and let Z be the space of 
Japanese currency (with (yen) as a basis). Let T : X — Y be the 
operation of converting US currency to British, and S : Y > Z the 
operation of converting British currency to Japanese, thus ST’: X > Z 
is the operation of converting US currency to Japanese (via British). 


Suppose that one dollar converted to half a pound, then we would have 
ound 
TM aster) = (0-5); 
or in different bases 


ound enn enn 
[Teeny = (0.005); (TI CenY = (0.5); (Fi ieuat) = (50). 


(cent) (cent) 


Thus the same linear transformation T corresponds to many different 
1 x 1 matrices, depending on the choice of bases both for the domain X 
and the range Y. However, conversion works properly no matter what 
basis you pick (as long as you are consistent), e.g. 


ie Oui ound ollar 
O(n) a eens (05) (6) =): 


Furthermore, if each pound converted to 200 yen, so that 


Se = 200) 


(pound) 


then we can work out the various matrices for ST’ by matrix multipli- 
cation (which in the 1 x 1 case is just scalar multiplication): 


[ST] = [g]%er)_ pry Poun® — (200)(0.5) = (100). 


(dollar) — (pound) (dollar) 


One can of course do this computation in different bases, but still get 
the same result, since the intermediate basis just cancels itself out at 
the end: 

[ST] uottar) = (Slipenny) (TT Gotan, = (2)(50) = (100) 


(dollar) (penny) (dollar) 


etc. 


102 


e You might amuse yourself concocting a vector example of currency 
conversion - for instance, suppose that in some country there was more 
than one type of currency, and they were not freely interconvertible. 
A US dollar might then convert to x amounts of one currency plus y 
amounts of another, and so forth. Then you could repeat the above 
computations except that the scalars would have to be replaced by 
various vectors and matrices. 


e One basic example of a linear transformation is the identity transfor- 
mation Iv : V — V on a vector space V, defined by Jyv = v. If we 


pick any basis 8 = (v1,...,Un) of V, then of course we have 
Iyv, = 1x vy tO ve +... +0 X vp 
Iyvg = OX vy +1 vot... +0 X vp 
Iyvn =O X vy +O X ve +... +1 Up, 
and thus 
10. 0 
8 O01... O 
Uvl3 = ; eee 
00... 1 


Thus the identity transformation is connected to the identity matrix. 


* OK OK OK 


Matrices as linear transformations. 


e We have now seen how linear transformations can be viewed as matrices 
(after selecting bases, etc.). Conversely, every matrix can be viewed as 
a linear transformation. 


e Definition Let A be an m xX n matrix. Then we define the linear 
transformation L4:R” — R™ by the rule 


Lax := Az for all x € R", 


where we think of the vectors in R” and R™ as column vectors. 


103 


Example Let A be the matrix 
1 2 
A:=|{ 3 4 
5 6 


Then Ly, : R? > R? is the linear transformation 


- 1 2 # 21 + 2x 
ta( 7 )= 3 4 (3 )- 371 + 429 
2 5 6 2 521 + 6x9 


It is easily checked that D4 is indeed linear. Thus for every m xn matrix 
A we can associate a linear transformation DL, :R” — R”™. Conversely, 
if we let a be the standard basis for R” and £6 be the standard basis 
for R™, then for every linear transformation T : R” — R”™ we can 


associate an m x n matrix [T]%. The following simple lemma shows 


that these two operations invert each other: 


Lemma 2. Let the notation be as above. If A is an m xn matrix, then 
[La]2 = A. If 7: R” > R"”™ is a linear transformation, then Lire = T. 


Proof Let a = (€1, €2,...,€n) be the standard basis of R”. For any 


column vector 
Ly 


In 


in R”, we have 
L=UME, +... Lne€n 


and thus 


Thus [z]* = z for all x € R”. Similarly we have [y]? = y for all y € R™. 
Now let A be an m x n matrix, and let x € R”. By definition 


Lax = Ax 


104 


On the other hand, we have 
[Laz}” = [Lalele]* 
and hence (by the previous discussion) 
Laz = [Lalee. 


Thus 
[Lal2x = Ax for all x € R”. 


If we apply this with x equal to the first basis vector | . |, we see that 


0 
the first column of the matrices [L.4]2 and A are equal. Similarly we 
see that all the other columns of [Z.4]? and A match, so that [L4]2 = A 
as desired. 


Now let 7’: R” > R”™ be a linear transformation. Then for any 7 € R” 
ral? = (TIE lx|* 
which by previous discussion implies that 
Le Te Lipset: 


Thus 7 and Ling are the same linear transformation, and the lemma 
is proved. 


Because of the above lemma, any result we can say about linear trans- 
formations, one can also say about matrices. For instance, the following 
result is trivial for linear transformations: 


Lemma 3. (Composition is associative) Let T: X > Y,S: 
Y +> Z, and R: Z — W be linear transformations. Then we have 
RSP) = CRS 


Proof. We have to show that R(ST)(x) = (RS)T(x) for all x € X. 
But by definition 


R(ST)(x) = R(ST)(x)) = R(S(D(@)) = (RS)(T(2)) = (RS)T (x) 


as desired. 


105 


e Corollary 4. (Matrix multiplication is associative) Let A be an 
m Xn matrix, B be al x m matrix, and C beak x / matrix. Then 
C(BA) = (CB)A. 


e Proof Since L4 : R” > R™, Lp: R™ > R’, and Lc: R!' > Ré are 
linear transformations, we have from the previous Lemma that 


Lo(Lela) = (LoLp)La. 


Let a, 3, y,6 be the standard bases of R”, R™, R’, and R*¥ respectively. 
Then we have 


[Lo(LeLa)l2 = [Loli [beLalt, = [Lel}([Lalp[L4]é) = C(BA) 
while 
(LoL) Lalt, = (LoLalglLalh = ((Lcl[Le)3)[Lalh = (CB)A 


using Lemma 2. Combining these three identities we see that C(BA) = 
(CB)A. 


e The above proof may seem rather weird, but it managed to prove the 
matrix identity C(BA) = (CB)A without having to do lots and lots of 
matrix multiplication. Exercise: try proving C(BA) = (CB)A directly 
by writing out C, B, A in co-ordinates and expanding both sides! 


e We have just shown that matrix multiplication is associative. In fact, 
all the familiar rules of algebra apply to matrices (e.g. A(B + C) = 
AB+ AC, and A times the identity is equal to A) provided that all the 
matrix operations make sense, of course. (The shapes of the matrices 
have to be compatible before one can even begin to add or multiply 
them together). The one important caveat is that matrix multiplication 
is not commutative: AB is usually not the same as BA! Indeed there 
is no guarantee that these two matrices are the same shape (or even 
that they are both defined at all). 


e Some other properties of A and Ly are stated below. As you can see, 
the proofs are similar to the ones above. 


106 


e If Ais an m X n matrix and B is an! x m matrix, then Lg, = Lely. 
Proof: Let a, 3,7 be the standard bases of R", R™, R! respectively. 
Then LpL, is a linear transformation from R” to R!, and so 


[Le Lat, = [Lal}(Lals = BA, 


and so by taking L of both sides and using Lemma 2, we obtain Lgl, = 
Lega as desired. 


e If Ais an m x n matrix, and B is another m x n matrix, then L4.p = 
La+Leg. Proof: L4 + Lp is a linear transformation from R” to R™. 
Let a, 8 be the standard bases of R” and R”™ respectiely. Then 


[La+ Lela = [Lala t+ (Lela = A+B 


and so by taking L of both sides and using Lemma 2, we obtain L4. 3 = 
L4+ Lp as desired. 


OK OK OK 


Invertible linear transformations 


e We have already dealt with the concepts of a linear transformation be- 
ing one-to-one, and of being onto. We now combine these two concepts 
to that of a transformation being invertible. 


e Definition. Let T :V — W be a linear transformation. We say that 
a linear transformation S :W —> V is the inverse of T if TS = Iw and 
ST = ly. We say that T is invertible if it has an inverse, and call the 
inverse T~'; thus TT-! = Iw and T“!T = Iy. 


e Example Let T : R® > R® be the doubling transformation Tv := 2uv. 
Let S : R? > R? be the halving transformation Sv := v/2. Then 
S is the inverse of T: ST(v) = S(2v) = (2v)/2 = v, while TS(v) = 
T(v/2) = 2(v/2) = v, thus both ST and TS are the identity on R’. 


e Note that this definition is symmetric: if S is the inverse of T, then T’ 
is the inverse of S. 


e Why do we call S' the inverse of T instead of just an inverse? This is 
because every transformation can have at most one inverse: 


107 


Lemma 6. Let T : V — W be a linear transformation, and let 
S:W-V and S’:W > V both be inverses of T. Then S = S’. 


Proof 
S= Sip Sto S(O LS Slo S38 - 


Not every linear transformation has an inverse: 


Lemma 7. If 7: V > W has an inverse S: W > V, then T must be 
one-to-one and onto. 


Proof Let’s show that T is one-to-one. Suppose that Tv = Tv’; we 
have to show that v = v’. But by applying S to both sides we get 
STv = STv’, thus Jyv = Iyv’, thus v = v’ as desired. Now let’s show 
that T is onto. Let w € W; we have to find v such that Tv = w. But 
w = Iww =TSw = T(Sw), so if we let v := Sw then we have Tv = w 
as desired. 


Thus, for instance, the zero transformation T : R® — R® defined by 
Tv = 0 is not invertible. 


The converse of Lemma 7 is also true: 


Lemma 8. If T': V > W is a one-to-one and onto linear transfor- 
mation , then it has an inverse S : W — V, which is also a linear 
transformation. 


Proof Let T’: V — W be one-to-one and onto. Let w be any element 
of W. Since T is onto, we have w = Tv for some v in V; since T is 
one-to-one; this v is unique (we can’t have two different elements v, v’ 
of V such that Tv and Tv’ are both equal to w). Let us define Sw 
as equal to this v, thus S is a transformation from W to V. For any 
w € W, we have w = Tv and Sw = v for some v € V, and hence 
TSw = w; thus TS is the identity Iw. 


Now we show that ST’ = Ivy, i.e. that for every v € V, we have STv = v. 
Since we already know that T'S = Iw, we have that TSw = w for all 
w € W. In particular we have TSTv = Tv, since Tv € W. But since 
T is injective, this implies that ST'v = v as desired. 


108 


Finally, we show that S is linear, i.e. that it preserves addition and 
scalar multiplication. We’ll just show that it preserves addition, and 
leave scalar multiplication as an exercise. Let w,w’ € W; we need to 
show that S(w+w’) = Sw+ Sw’. But we have 


T(S(w+w’)) = (TS)(w+w’) = Iw(wt+w") = Iwwt+lyw'’ =TSwt+TSswu’ = T(Sw+Su’); 


since T is one-to-one, this implies that S(w+w’) = Sw+Sw’ as desired. 
The preservation of scalar multiplication is proven similarly. 


Thus a linear transformation is invertible if and only if it is one-to-one 
and onto. Invertible linear transformations are also known as isomor- 
phisms. 


Definition Two vector spaces V and W are said to be isomorphic if 
there is an invertible linear transformation T’: V — W from one space 
to another. 


Example The map T : R? > P2(R) defined by 
T(a, b,c) := az? +br+c 


is easily seen to be linear, one-to-one, and onto, and hence an isomor- 
phism. Thus R® and P:(R) are isomorphic. 


Isomorphic spaces tend to have almost identical properties. Here is an 
example: 


Lemma 9. Two finite-dimensional spaces V and W are isomorphic if 
and only if dim(V) = dim(W). 


Proof If V and W are isomorphic, then there is an invertible linear 
transformation T’.: V — W from V to W, which by Lemma 7 is one- 
to-one and onto. Since T is one-to-one, nullity(T) = 0. Since T is 
onto, rank(T) = dim(W). By the dimension theorem we thus have 
dim(V) = dim(W). 


Now suppose that dim(V) and dim(W) are equal; let’s say that dim(V) = 
dim(W) = n. Then V has a basis {v,...,Un}, and W has a basis 


109 


{W1,...,Wn}. By Theorem 6 of last week’s notes, we can find a lin- 
ear transformation T : V + W such that Tv, = w,...,TUn = Wp. 
By Theorem 3 of last week’s notes, w1,...,W, must then span R(T). 
But since w ,...,W, span W, we have R(T) = W, i.e. T is onto. By 
Lemma, 2 of last week’s notes, 7’ is therefore one-to-one, and hence is 
an isomorphism. Thus V and W are isomorphic. 


e Every basis leads to an isomorphism. If V has a finite basis 6 = 
(v1,...,Un), then the co-ordinate map ¢3 : V > R” defined by 


is a linear transformation (see last week’s homework), and is invertible 
(this was discussed in last week’s notes, where we noted that we can 
reconstruct x from [2]? and vice versa). Thus @, is an isomorphism be- 
tween V to R”. In the textbook @g is called the standard representation 
of V with respect to 6. 


e Because of all this theory, we are able to essentially equate finite- 
dimensional vector spaces V with the standard vector spaces R”, to 
equate vectors v € V with their co-ordinate vectors [v|* € R” (pro- 
vided we choose a basis a for V) and linear transformations T : V > W 
from one finite-dimensional space to another, with n x m matrices [T]8. 
This means that, for finite-dimensional linear algebra at least, we can 
reduce everything to the study of column vectors and matrices. This 
is what we will be doing for the rest of this course. 


OK OK OK 


Invertible linear transformations and invertible matrices 


e Anmxn matrix A has an inverse B, if B is ann x m matrix such that 
BA =TI, and AB = I,,. In this case we call A an invertible matriz, 
and denote B by A7!. 


e Example. If 


then 


1/2 0 O 
A Shi0). 173°. 0 
0 oO 1/4 


is the inverse of A, as can be easily checked. 


The relationship between invertible linear transformations and invert- 
ible matrices is the following: 


Theorem 10. Let V be a vector space with finite ordered basis a, 
and let W be a vector space with finite ordered basis 3. Then a linear 
transformation T': V — W is invertible if and only if the matrix [T]° 
is invertible. Furthermore, ([T]?)~' = (T~']g 


Qa 


Proof. Suppose that V is n-dimensional and W is m-dimensional; this 
makes [T]2 an m x n matrix. 


First suppose that T : V + W has an inverse T~!: W > V. Then 
TEs = (TT = [lw] g = Im 


while 
[Tr VIls = (PTS = Wk = he, 


thus [T'~"]g is the inverse of [T]% and so [T']? is invertible. 


Now suppose that [7]? is invertible, with inverse B. We’ll prove shortly 
that there exists a linear transformation S$ :W — V with [S]j = B. 
Assuming this for the moment, we have 


[ST]e = (SlelTo = BITE =n = Uva 


and hence ST = Jy. A similar argument gives TS = Iw, and so S is 
the inverse of JT and so T is invertible. 


It remains to show that we can in fact find a transformation S : W > V 
with [S]3 = B. Write a = (v,.--,Un) and 6 = (wi,...,Wm). Then we 
want a linear transformation S :W — V such that 


Sw = Byvy, Peeks BinUn 


111 


Swe = Byvy+...+ Bonvn 


Swm = Bmivit...+ BmnUn- 


But we can do this thanks to Theorem 6 of last week’s notes. 


Corollary 11. An m xn matrix A is invertible if and only if the linear 
transformation LL, :R” — R” is invertible. Furthermore, the inverse 
of Ly is La-1. 


Proof. If a is the standard basis for R” and £ is the standard basis 
for R™, then 
[Lale = A. 


Thus by Theorem 10, A is invertible if and only if Ly is. Also, from 
Theorem 10 we have 


and hence 


as desired. 


Corollary 12. In order for a matrix A to be invertible, it must be 
square (i.e. m=). 


Proof. This follows immediately from Corollary 11 and Lemma 9. 


On the other hand, not all square matrices are invertible; for instance 
the zero matrix clearly does not have an inverse. More on this in a 
later week. 


112 


Math 115A - Week 5 
Textbook sections: 1.1-2.5 
Topics covered: 


Co-ordinate changes 


Stuff about the midterm 


OK OK OK 


Changing the basis 


In last week’s notes, we used bases to convert vectors to co-ordinate 
vectors, and linear transformations to matrices. We have already men- 
tioned that if one changes the basis, then the co-ordinate vectors and 
matrices also change. Now we study this phenomenon more carefully, 
and quantify exactly how changing the basis changes these co-ordinate 
vectors and matrices. 


Let’s begin with co-ordinate vectors. Suppose we have a vector space 
V and two ordered bases 3, 2’ of that vector space. Suppose we also 
have a vector v in V. Then one can write v as a co-ordinate vector 
either with respect to 6 - thus obtaining [v]? - or with respect to [v]”. 
The question is now: how are [v]* and [v]® related? 


Fortunately, this question can easily be resolved with the help of the 
identity operator Iy : V > V on V. By definition, we have 


Iyv = Vv. 


We now convert this equation to matrices, but with a twist: we measure 
the domain V using the basis 3, but the range V using the basis 3’! 
This gives 


/ 


(v5 (el? = fol®. 
Thus we now know how to convert from basis 3 to basis (’: 
(ol = [v3 [el’. (0.8) 


(If you like, the two bases 3 have “cancelled each other” on the right- 
hand side). 


113 


e Example. Let V = R?, and consider both the standard ordered basis 
6 := ((1,0), (0,1)) and a non-standard ordered basis 3’ := ((1, 1), (1, -1)). 
Let’s pick a vector v € R? at random; say v := (5,3). Then one can 


easily check that [v]° = ( ; ) and [v]* = ( : ) (why?). 


e Now let’s work out [Iv]§ . Thus we are applying Jy to elements of 3 
and writing them in terms of 6’. Since 


Bro ay = s(h 1) + s(h =i 


and 


we thus see that 


We can indeed verify the formula (0.8), which in this case becomes 


(= GA) G) 


Note that Iv]3 is different from [Iv]§, [Iv], or [Iv \jr- For instance, 


we have 
/ 1 O 
Bo hee 
late =( 9 1) 


Iv]. = ( ‘ - ) 


(why?). Note also that [Iv lj is the inverse of [I vis (can you see why 
this should be the case, without doing any matrix multiplication?) 


(why?), while 


e A scalar example. Let V be the space of all lengths; this has a basis 
6 := (yard), and a basis 3’ := (foot). Since the identity I) applied to 
a yard yields three feet, we have 


i=) 


114 


(i.e. the identity on lengths is three feet per yard). Thus for any length 
vU, 
007 oot ar ar 
(o} Po = Lv] geman ol"? = (8) fo] "9. 

Thus for instance, if [v]%" = (4) (so v is four yards), then [v]f~) 
must equal (3)(4) = (12) (i.e. vu is also twelve feet). 

Conversely, we have [Jy] = (1/3) (i.e. we have 1/3 yards per foot). 
The matrix [Iv] 3 is called the change of co-ordinate matrix from 6 to 
G'; it is the matrix we use to multiply by when we want to convert 3 
co-ordinates to 3’. Very loosely speaking, [I vif measures how much of 


B' lies in B (just as Feate ae measures how many feet lie in a yard). 


Change of co-ordinate matrices are always square (why?) and always 
invertible (why’). 


Example Let V := P,(R), and consider the two bases 6 := (1,2, x?) 
and 6! := (x, 2,1) of V. Then 


O30) 
Tle 420 P.0 
10 0 


(why?). 


Example Suppose we have a mixture of carbon dioxide C'Og and car- 
bon monoxide CO molecule. Let V be the vector space of all such 
mixtures, so it has an ordered basis 6 := (CO2,CO). One can also 
use just the basis 6’ = (C,O) of carbon and oxygen atoms (where we 
are ignoring the chemical bonds, etc., and treating each molecule as 
simply the sum of its components, thus CO, = 1 x C+ 2 x O and 
CO=1xC+1xO). Then we have 


wif = (3 1): 


this can be interpreted as saying that C'O2 contains 1 atoms of carbon 
and 2 atoms of oxygen, while CO contains 1 atom of carbon and 1 


115 


atom of oxygen. The inverse change of co-ordinate matrix is 


lp = viby= (57 1, Js 


this can be interpreted as saying that an atom of carbon is equivalent to 
—1 molecules of CO2 and +2 molecules of CO, while an atom of oxygen 
is equivalent to 1 molecule of COz and —1 molecules of CO. Note how 
the inverse matrix does not quite behave the way one might naively 
expect; for instance, given the factor of 2 in the conversion matrix [J vie 


one might expect a factor of 1/2 in the inverse conversion matrix [I vee 
instead we get strange negative numbers all over the place. (To put it 
another way, since C'O2 contains two atoms of oxygen, why shouldn’t 
oxygen consist of 5 of a molecule of CO ? Think about it). 


Example (This rather lengthy example is only for physics-oriented 
students who have some exposure to special relativity; everyone else 
can safely skip this example). One of the fundamental concepts of 
special relativity is that space and time should be treated together as 
a single vector space, and that different observers use different bases to 
measure space and time. 


For simplicity, let us assume that space is one-dimensional; people can 
move in only two directions, which we will call right and left. Let’s say 
that observers measure time in years, and space in light-years. 


Let’s say there are two observers, Alice and Bob. Alice is an inertial 
observer, which means that she is not accelerating. Bob is another 
inertial observer, but travelling at a fixed speed Zc to the right, as 
measured by Alice; here c is of course the speed of light. 


An event is something which happens at a specific point in space and 
time. Any two events are separated by some amount of time and some 
amount of distance; however the amount of time and distance that 
separates them depends on which observer is perceiving. For instance, 
Alice might measure that event Y occurred 8 years later and 4 light- 
years to the right of event X, while Bob might measure the distance 
and duration between the two events differently. (In this case, it turns 


116 


out that B measures Y as occurring 7 years later and 1 light-year to 
the left of event X). 


Let V denote the vector space of all possible displacements in both 
space and time; this is a two-dimensional vector space, because we 
are assuming space to be one-dimensional. (In real life, space is three 
dimensional, and so spacetime is four dimensional). To measure things 
in this vector space, Alice has a unit of length - let’s call it the Alice- 
light-year, and a unit of time - the Alice-year. Thus in the above 
example, if we call v the vector from event X to event Y, then v = 
8 x Alice — year + 4 x Alice — light — year. (We’ll adopt the convention 
that a displacement of length in the right direction is positive, while 
a displacement in the left direction is negative). Thus Alice uses the 
ordered basis (Alice — light — year, Alice — year) to span the space V. 


Similarly, Bob has the ordered basis (Bob — light — year, Bob — year). 
These bases are related by the Lorentz transformations 


5 3 
Alice — light — year = qBob — light — year — qBob — year 


3 5 
Alice — year = — 7 Bob — light — year + qBob — year 


(this is because of Bob’s velocity 2c; different velocities give different 
transformations, of course. A derivation of the Lorentz transformations 
from Einstein’s postulates of relativity is not too difficult, but is beyond 
the scope of this course). In other words, we have 


I (Bob—light—year,Bob—year) —__ 5/4 —3/4 
| Wares Unt eae Aloe geen ~~ —3/4 5/4 


Some examples. Suppose Alice emits a flash of light (event X), waits 
for one year without moving, and emits another flash of light (event 
Y). Let v denote the vector from X to Y. Then from Alice’s point of 
view, v consists of one year and 0 light-years (because she didn’t move 
between events): 


[ea em ea ae) _ 1 
y ~ Lo 


117 


and thus 


[y](Bo—tiaht—year,Bob—year) ( °/ a 7 4 ) ( ; ) = ( J i iA ) 


in other words, Bob perceives event Y occurring 5/4 years afterward 
and 3/4 light-years to the left of event X. This is consistent with the 
assumption that Bob was moving to the right at 2¢ with respect to 
Alice, so that from Bob’s point of view Alice is receding to the left 
at —3c. This also illustrates the phenomenon of time dilation - what 
appears to be a single year from Alice’s point of view becomes 5/4 years 
when measured by Bob. 


Another example. Suppose a beam of light was emitted by some source 
(event A) in a left-ward direction and absorbed by some receiver (event 
B) some time later. Suppose that Alice perceives event B as occurring 
one year after, and one light-year to the left of, event A; this is of course 
consistent with light traveling at 1c. Thus if w is the vector from A to 
B, then 


(Alice—light—year,Alice—year) __ 1 
[v] ~~ | 


and thus 


](@9Hattyoor Babar) _ ( ee ) ( i Z ( Ze ) | 


Thus Bob views event B as occurring two years after and two light-years 
to the left of event A. Thus Bob still measures the speed of light as 1c 
(indeed, one of the postulates of relativity is that the speed of light is 
always a constant c to all inertial observers), but the light is “stretched 
out” over two years instead of one, resulting in Bob seeing the light 
at half the frequency that Alice would. This is the famous Doppler 
red shift effect in relativity (receding light has lower frequency and is 
red-shifted; approaching light has higher frequency and is blue-shifted). 


It may not seem like it, but this situation is symmetric with respect to 
Alice and Bob. We have 


rp (Bob—light—year,Bob—year) ({I ee) a 
Wel Calico Tifa Aloe aoa) ™ V] (Alice—light—year,Alice—year) 


118 


(5/4 -3/4\" _ (5/4 3/4 

~ \ 3/4 5/4 SNeB fh Sia? 
as one can verify by multiplying the above two matrices together (we 
will discuss matrix inversion in more detail next week). In other words, 


5 3 
Bob — light — year = qillice — light — year + qillice — year 


3 5 
Bob — year = qAlice — light — year + qAlice — year. 


Thus for instance, just as Alice’s years are time-dilated when measured 
by Bob, Bob’s years are time-dilated when measured by Bob: if Bob 
emits light (event X’), waits for one year without moving (though of 
course still drifting at 2c as measured by Alice), and emits more light 
(event Y’), then Alice will perceive event Y’ as occurring 5/4 years 
after and 3/4 light-years to the right of event X’; this is consistent with 
Bob travelling at 2c, but Bob’s year has been time dilated to 2 years. 
(Why is it not contradictory for Alice’s years to be time dilated when 
measured by Bob, and for Bob’s years to be time dilated when measured 
by Alice? This is similar to the (CO2, CO) versus (C,O) example: one 
molecule CO: contains two atoms of oxygen (plus some carbon), while 
an atom of oxygen consists of one molecule of CO, (minus some CO), 
and this is not contradictory. Vectors behave slightly different from 
scalars sometimes). 


KOK OK OK Ok 


Co-ordinate change and matrices 


e let 7: V + V be a linear transformation from a vector space V 
to itself. (Such transformations are sometimes called automorphisms, 
because they map onto themselves). Given any basis 3 of V, we can 
form a matrix [T iF representing 7’ in the basis 8. Of course, if we 
change the basis, from £ to a different basis, say 6’, then the matrix 
changes also, to [TJ lee However, the two are related. 


e Lemma 1. Let V be a vector space with two bases 3’ and (3, and let 
Q:= Ivl§ be the change of co-ordinates matrix from ( to 6’. Let 


119 


T:V—-V bea linear transformation. Then [T i and [T if are related 
by the formula 


ITI = QIr |g". 
Proof We begin with the obvious identity 
2 S=1y 1 iy 
and take bases (using Theorem 1 from last week’s notes) to obtain 
eae = [Iv] aah eaye 


Substituting Iv] ='() and [Iv Vj = (Iv\3)7 = Q7' we obtain the 
Lemma. 


Example Let’s take a very simple portfolio model, where the portfo- 
lio consists of one type of stock (let’s say GM stock), and one type of 
cash (let’s say US dollars, invested in a money market fund). Thus 
a portfolio lives in a two-dimensional space V, with a basis § := 
(Stock, Dollar). Let’s say that over the course of a year, a unit of 
GM stock issues a dividend of two dollars, while a dollar invested in 
the money market fund would earn 2 percent, so that 1 dollar becomes 
1.02 dollars. We can then define the linear transformation T: V > V, 
which denotes how much a portfolio will appreciate within one year. (If 
one wants to do other operations on the portfolio, such as buy and sell 
stock, etc., this would require other linear transformations; but in this 
example we will just analyze plain old portfolio appreciation). Since 


T(1 x Stock) = 1x Stock +2 x Dollar 


T(1 x Dollar) = 1.02 x Dollar 


Ta = ( a ). 


Now let’s measure J’ in a different basis. Suppose that GM’s stock has 
split, so that each old unit of Stock becomes two units of Newstock 


we thus see that 


120 


(so Newstock = 0.5Stock). Also, suppose for some reason (decimal- 
ization?) we wish to measure money in cents instead of dollars. So we 
now have a new basis 3’ = (Newstock, Cent). Then we have 


Newstock = 0.5Stock + ODollar; Cent = OStock + 0.01Dollar, 


Q=1f=(5° oor} 


while similar reasoning gives 


: 2 0 
a= l= (5 too ): 


sO 


Thus 


Bt nee ischin 0.0 1 0 2 0 
Ma = OIF |s@ -(1 ale 1.02 ] 0 100 
which simplifies to 
/ (1 0 
— 
mB = ( too nee 


T(Newstock) = 1 x Newstock + 100 x Cent 
T(Cent) =0 x Newstock + 1.02 x Cent. 


Thus 


This can of course be deduced directly from our hypotheses; it is in- 
structive to do so and to compare that with the matrix computation. 


e Definition Two n x n matrices A, B are said to be similar if one has 
B =QAQ™! for some invertible n x n matrix Q. 


e Thus the two matrices [T’ Ie and [T 4 are similar. Similarity is an 
important notion in linear algebra and we will return to this property 
later. 


* OK OK OK OK 


Common sources of confusion 


121 


e This course is very much about concepts, and on thinking clearly and 
precisely about these concepts. It is particularly important not to 
confuse two concepts which are similar but not identical, otherwise this 
can lead to one getting hopelessly lost when trying to work one’s way 
through a problem. (This is true not just of mathematics, but also of 
other languages, such as English. If one confuses similar words - e.g. the 
adjective “happy” with the noun “happiness” - then one still might be 
able to read and write simple sentences and still be able to communicate 
(although you may sound unprofessional while doing so), but complex 
sentences will become very difficult to comprehend). Here I will list 
some examples of similar concepts that should be distinguished. These 
points may appear pedantic, but an inability to separate these concepts 
is usually a sign of some more fundamental problem in comprehending 
the material, and should be addressed as quickly as possible. 


e “Vector” versus “Vector space”. A vector space consists of vectors, 
but is not actually a vector itself. Thus questions like “What is the 
dimension of (1, 2,3,4)?” are meaningless; (1,2,3,4) is a vector, not a 
vector space, and only vector spaces have a concept of dimension. A 
question such as “What is the dimension of (21,2%1,21)?” or “What is 
the dimension of 7; + #2 +23 = 0?” is also meaningless for the same 
reason, although “What is the dimension of {(21,271,21) : 7, € R}?” 
or “What is the dimension of {(x1, 22,23) : 41 + %2 +43 = O}?” are 
not. 


e In a similar spirit, the zero vector 0 is distinct from the zero vector 
space {0}, and is in turn distinct from the zero linear transformation 
To. (And then there is also the zero scalar 0). 


e A set S of vectors, versus the span span(S) of that set. This 
is a similar problem. A statement such as “What is the dimension of 
{(1, 0,0), (0, 1,0), (0,0, 1)}?” is meaningless, because the set {(1,0, 0), (0, 1,0), (0,0, 1)} 
is not a vector space. It is true that this set spans a vector space - R® 
in this example - but that is a different object. Similarly, it is not correct 
to say that the sets {(1,0,0), (0, 1,0), (0,0, 1)} and {(1, 1,0), (1, 0,1), (0,0, 1)} 
are equal - they may span the same space, but they are certainly not 
the same set. 


122 


e For a similar reason, it is not true that if you add (0,0,1) to R?, that 
you “get” R23. First of all, R? is not even contained in R°, although 
the ry-plane V := {(2,y,0) : 2,y,€ R}, which is isomorphic to R’, is 
contained in R?. But also, if you take the union of V with {(0,0,1)}, 
one just gets the set V U{(0,0, 1)}, which is a plane with an additional 
point added to it. The correct statement is that V and (0,0, 1) together 
span or generate R®. 


e “Finite” versus “finite-dimensional”. A set is finite if it has 
finitely many elements. A vector space is finite-dimensional if it has 
a finite basis. The two notions are distinct. For instance, R? is in- 
finite (there are infinitely many points in the plane), but is finite- 
dimensional because it has a finite basis {(1,0), (0, 1)}. The zero vector 
space {0} contains just one element, but is zero-dimensional. The set 
{(1, 0,0), (0, 1,0), (0,0, 1)} is finite, but it does not have a dimension 
because it is not a vector space. 


e “Subset” versus “subspace” Let V be a vector space. A sub- 
set U of V is any collection of vectors in V. If this subset is also 
closed under addition and scalar multiplication, we call U a subspace. 
Thus every subspace is a subset, but not conversely. For instance, 
{(1, 0,0), (0, 1,0), (0,0, 1)} is a subset of R?, but is not a subspace (so 
it does not have a dimension, for instance). 


e “Nullity” versus “null space” The null space N(T) of a linear trans- 
formation T : V — W is defined as the space N(T) = {vu €V: Tv = 
0}; it is a subspace of V. The nullity nullity(7) is the dimension of 
N(T); it is a number. So statements such as “the null space of T is 
three” or “the nullity of T is R®” are meaningless. Similarly one should 
distinguish “range” and “rank” 


e “Range R(T)” versus “final space” This is an unfortunate notation 
problem. If one has a transformation 7’: V — W, which maps elements 
in V to elements in W, V is sometimes referred to as the domain and W 
is referred to as the range. However, this notation is in conflict with the 
range R(T), defined by R(T) = {Tv : vu € V}. If the transformation 
T is onto, then R(T) and W are the same, but otherwise they are not. 
To avoid this confusion, I will try to refer to V as the initial space of 


123 


T, and W as the final space. Thus in the map T : R? > R? defined by 
T (21,2) := (11, 22,0), the final space is R°, but the range is only the 
xy-plane, which is a subspace of R?. 


To re-iterate, in order to be able to say that a transformation 7’ maps 
V to W, it is not required that every element of W is actually covered 
by T; if that is the case, we say that T maps V onto W, and not just 
to W. It is also not required that different elements of V have to map 
to different elements of W; that is true for one-to-one transformations, 
but not for general transformations. So in particular, if you are given 
that T : V — W is a linear transformation, you cannot necessarily 
assume that T’ is one-to-one or onto unless that is explicitly indicated 
in the question. 


“Vector” versus “Co-ordinate vector” A vector v in a vector space 
V is not the same thing as its co-ordinate vector [v]® with respect to 
some basis 6 of V. For one thing, v needn’t be a traditional vector (a 
row or column of numbers) at all; it may be a polynomial, or a matrix, 
or a function - these are all valid examples of vectors in a vector space. 
Secondly, the co-ordinate vector [v]® depends on the basis 6 as well 
as on the vector v itself; change 6 and you change the co-ordinate 
vector [v]*, even if v itself is kept fixed. For instance, if V = P2(R) and 
v = 3x74+42+5, then v is a vector (even though it’s also a polynomial). 
5 
If 8 := (1,2,27), then [v]? = | 4 |, but this is not the same object 
3 
as 3x7 + 4x + 5; one is a column vector and the other is a polynomial. 
Calling them “the same” will lead to trouble if one then switches to 


another basis, such as 6’ := (x?,x,1), since the co-ordinate vector is 
3 3) 

now [v]® = | 4 |, which is clearly different from | 4 
3) 3 


There is of course a similar distinction between a linear transformation 
T, and the matrix representation [T]3 of that transformation T. 


“Closed under addition” versus “preserves addition”. A set U 
is closed under addition if, whenever x and y are elements of U, the sum 


124 


x+y is also in U. A transformation T : V + W preserves addition 
if, whenever x and y are elements in V, that T(z + y) = Tx + Ty. 
The two concepts refer to two different things - one has to do with a 
set, and the other with a transformation - but surprisingly it is still 
possible to confuse the two. For instance, in one of the homework 
questions, one has to show that the set T(U) is closed under addition, 
and someone thought this meant that one had to show that T(x+y) = 
Tx+Ty for all x in U; probably what happened was that the presence 
of the letter T in the set T(U) caused one to automatically think of the 
preserving addition property rather the closed-under-addition property. 
Of course, there is a similar distinction between “closed under scalar 
multiplication” and “preserves scalar multiplication” . 


e A vector v, versus the image Tv of that vector. A linear trans- 
formation T : V + W can take any vector v in V, and transform it to 
a new vector J'v in W. However, it is dangerous to say things like v 
“becomes” Tv, or v “is now” Tv. If one is not careful, one may soon 
write that v “is” Tv, or think that every property that v has, auto- 
matically also holds for Tv. The correct thing to say is that Tv is the 
image of v under 7; this image may preserve some properties of the 
original vector v, but it may distort or destroy others. In general, T'v is 
not equal to v (except in special cases, such as when T is the identity 
transformation). 


e Hypothesis versus Conclusion. This is not a confusion specific to 
linear algebra, but nevertheless is an important distinction to keep in 
mind when doing any sort of “proof” type question. You should always 
know what you are currently assuming (the hypotheses), and what you 
are trying to prove (the conclusion). For instance, if you are trying to 
prove 

Show that if 7: V — W is linear, then 
T(cx + y) =cTx+Ty for all x,y € V and all scalars c 


then your hypothesis is that T.: V + W is a linear transformation (so 
that T(x + y) = Tx + Ty and T(cxr) = cTx for all z,y € V and all 
scalars c, and your objective is to prove that for any vectors 7,y € V 
and any scalar c, we have T(cxr + y) = cTx+Ty. 


e On the other hand, if you are trying to prove that 


125 


Show that if 7: V > W is such that T(cxz + y) = cTx+Ty 
for all x,y € V and all scalars c, then T is linear. 


then your hypotheses are that T maps V to W, and that T(cx + y) = 
cT'«x + Ty for all x,y € V and all scalars c, and your objective is to 
prove that T is linear, ie. that T(x +y) = Tx + Ty for all vectors 
x,y € V, and that T(cxr) = cT x for all vectors x € V and scalars c. 
This is a completely different problem from the previous one. (Part 2. 
of Question 7 of 2.2 requires you to prove both of these implications, 
because it is an “if and only if” question). 


In your proofs, it may be a good idea to identify which of the statements 
that you write down are things that you know from the hypotheses, and 
which ones are those that you want. Little phrases like “We are given 
that”, “It follows that”, or “We need to show that” in the proof are very 
helpful, and will help convince the grader that you actually know what 
you are doing! (provided that those phrases are being used correctly, 
of course). 


“For all” versus “For some”. Sometimes it is really important to 
read the “fine print” of a question - it is all to easy to jump to the 
equations without reading all the English words which surround those 
equations. For instance, the statements 

“Show that 7'’v = 0 for some non-zero v € V” 


is completely different from 
“Show that T’v = 0 for all non-zero v € V” 


In the first case, one only needs to exhibit a single non-zero vector v 
in V for which Tv = 0; this is a statement which could be proven by 
an example. But in the second case, no amount of examples will help; 
one has to show that 7’v = 0 for every non-zero vector v. In the second 
case, probably what one would have to do is start with the hypotheses 
that v € V and that v 4 0, and somehow work one’s way to proving 
that 7 o-=0, 


Because of this, it is very important that you read the question carefully 
before answering. If you don’t understand exactly what the question 


126 


is asking, you are unlikely to write anything for the question that the 
grader will find meaningful. 


127 


Math 115A - Week 6 
Textbook sections: 3.1-5.1 
Topics covered: 


e Review: Row and column operations on matrices 

e Review: Rank of a matrix 

e Review: Inverting a matrix via Gaussian elimination 
e Review: Determinants 


* OK OK OK 


Review: Row and column operations on matrices 


e We now quickly review some material from Math 33A which we will 
need later in this course. The first concept we will need is that of an 
elementary row operation. 


e An elementary row operation takes an m Xn matrix as input and returns 
a different m x n matrix as output. (In other words, each elementary 
row operation is a map from Minyn(R) to Minxn(R). There are three 
types of elementary row operations: 


e Type 1 (row interchange). This type of row operation interchanges 
row 7 with row j for some i, 7 € {1,2,...,m}. For instance, the opera- 
tion of interchanging rows 2 and 4 in a 4 x 3 matrix would change 


1 2 8 
4 5 6 
7 8 9 
10 11 12 
to 
2 os 
10 11 12 
7 8 9 
4 5 6 


128 


Observe that the final matrix can be obtained from the initial matrix 
by multiplying on the left by the 4 x 4 matrix 


which is the identity matrix with rows 2 and 4 switched. (Why?) Thus 
for instance 


L283 1 0 0 0 1 2 38 
0G Bape ee ok) A (oO eC oO da 4 5 6 
Le 2 SD be eG. At ero 
4 5 6 0 1 0 0 TO) dt. 12 


and more generally, if A is a 4 x 3 matrix, then the interchange of rows 
2 and 4 replaces A with EA. We refer to E as an 4 x 4 elementary 
matrix of type 1. 


Also observe that row interchange is its own inverse; if one replaces A 
with FA, and then replaces EA with EEA (i.e. we interchange rows 
2 and 4 twice), we get back to where we started, because EE = Ij. 


Type 2 (row multiplication) This type of elementary row operation 
takes a row 7 and multiplies it by a non-zero scalar c. For instance, the 
elementary row operation that multiplies row 2 by 10 would map 


to 


10 11 12 


129 


This operation is the same as multiplying on the left by the matrix 


jem) 
oro OG 
a > a) 


which is what one gets by starting with the identity matrix and multi- 
plying row 2 by 10. (Why?) We call FE an example of a 4x4 elementary 
matrix of type 2. 


Row multiplication is invertible; the operation of multiplying a row i 
by a non-zero scalar c is inverted by multiplying a row 7 by the non-zero 
scalar 1/c. In the above example, the inverse operation is given by 


i.e. to invert the operation of multiplying row 2 by 10, we then multiply 
row 2 by 1/10. 


Type 3 (row addition) For this row operation we need two rows 1, 
j, and a scalar c. The row operation adds ¢ multiples of row 7 to row 
j. For instance, if one were to add 10 multiples of row 2 to row 3, then 


would become 


LO. Ae" 2 


130 


Equivalently, this row operation amounts to multiplying the original 
matrix on the left by the matrix 


Le 1O2"6 

OE. 0210 
ie Oe? EO. Dt oO. 

Oe. 106 ul 


which is what one gets by starting with the identity matrix and adding 
10 copies of row 2 to row 3. (Why?) We call F an example of a 4 x 4 
elementary matrix of type 3. It has an inverse 


El := 


oo Cc KF 
| 
rae 
jon) 


thus to invert the operation of adding 10 copies of row 2 to row 3, we 
subtract 10 copies of row 2 to row 3 instead. 


Thus to summarize: there are special matrices, known as elementary 
row matrices, and an elementary row operation amounts to multiply- 
ing a given matrix on the left by one of these elementary row matrices. 
Each of the elementary row matrices is invertible, and the inverse of an 
elementary matrix is another elementary matrix. (In the above discus- 
sion, we only verified this for specific examples of elementary matrices, 
but it is easy to see that the same is true for general elementary ma- 
trices. See the textbook). 


There are also elementary column operations, which are very similar to 
elementary row operations, but arise from multiplying a matrix on the 
right by an elementary matrix, instead of the left. For instance, if one 
multiplies a matrix A with 4 columns on the right by the elementary 
matrix 


coo Oo Ke 
eS OO O&O 
oro © 
coor © 


131 


then this amounts to switching column 2 and column 4 of A (why?). 
This is a type 1 (column interchange) elementary column move. Simi- 
larly, if one multiplies A on the right by 


by 0 OW 

ie © 0 Si 0 
fi OP :0. ik i? 

Ois05, "Ob =I, 


then this amounts to multiplying column 2 of A by 10 (why?). This is 
a type 2 (column multiplication) elementary column move. Finally, if 
one multiplies A on the right by 


Le (Oe"O 

OF Oi 
He OF anOe at eae 4 

OO. ~ 40% ob 


then this amounts to adding 10 copies of column 3 to column 2 (why?). 
This is a type 3 (column addition) elementary column move. 


Elementary row (or column) operations have several uses. One impor- 
tant use is to simplify systems of linear equations of the form Az = 8, 
where A is some matrix and x, 6 are vectors. If F is an elementary 
matrix, then the equation Ax = 6 is equivalent to EAxr = Eb (why are 
these two equations equivalent? Hint: E is invertible). Thus one can 
simplify the equation Ar = b by performing the same row operation 
to both A and b simultaneously (one can concatenate A and 6 into a 
single (artificial) matrix in order to do this). Eventually one can use 
row operations to reduce A to row-echelon form (in which each row is 
either zero, or begins with a 1, and below each such 1 there are only 
zeroes), at which point it becomes straightforward to solve for Ax = b 
(or to determine that there are no solutions). We will not review this 
procedure here, because it will not be necessary for this course; see 
however the textbook (or your Math 33A notes) for more information. 
However, we will remark that every matrix can be row-reduced into 
row-echelon form (though there is usually more than one way to do 
so). 


132 


e Another purpose of elementary row or column operations is to deter- 
mine the rank of a matrix, which is a more precise measurement of its 
invertibility. This will be the purpose of the next section. 


* OK OK OK 


Rank of a matrix 


e Recall that the rank of a linear transformation T : V — W is the 
dimension of its range R(T). Rank has a number of uses, for instance 
it can be used to tell whether a linear transformation is invertible: 


e Lemma. Let V and W be n-dimensional spaces, and let dim(V) = 
dim(W) = n. Let T: V > W be a linear transformation. Then T is 
invertible if and only if rank(T’) = n (i.e. T has the maximum rank 
possible). 


e Proof. If rank(T) = n, then R(T) has the same dimension n as W. 
But R(T) is a subspace of W, so this forces R(T) to actually equal 
W (see Theorem 2 of Week 2 notes). Thus T is onto. Also, from the 
dimension theorem we see that nullity(7) = 0, and so T’ is one-to-one. 
Thus 7’ is invertible. 


e Conversely, if T is invertible, then it is one-to-one, hence nullity(T) = 0, 
and hence by the dimension theorem rank(T) = n. 


e One interesting thing about rank is that if you multiply a linear trans- 
formation on the left or right by an invertible transformation, then the 
rank doesn’t change: 


e Lemma 1. Let T.: V > W be a linear transformation from one 
finite-dimensional space to another, let S : U — V be an invertible 
transformation, and let Q : W — Z be another invertible transforma- 
tion. Then 


rank(T) = rank(QT) = rank(7'S) = rank(QTS). 
e Proof. First let us show that rank(T) = rank(7'S). To show this, 
we first compute the ranges of 7: V > W and T'S : U ~ W. By 
definition of range, R(T) = T(V), the image of V under 7. Similarly, 


133 


R(TS) =TS(U). But since S : U — V is invertible, it is onto, and so 
S(U)-=V. Thus 


R(TS) = TS(U) = T(S(U)) = T(V) = R(T) 


and so 
rank(T'S) = rank(T). 


A similar argument gives that rank(Q7’S) = rank(QT) (just replace 
T by RT in the above. To finish the argument we need to show that 
rank(QT) = rank(7’). We compute ranges again: 


R(QT) = QTV) = QV) = QUT), 


so that 
rank(QT) = dim(Q(R(T))). 


But since Q is invertible, we have 
dim(Q(R(T))) = dim(R(T)) = rank(T) 
(see Q3 of the midterm!). Thus rank(QT) = rank(T) as desired. 


Now to compute the rank of an arbitrary linear transformation can get 
messy. The best way to do this is to convert the linear transform to a 
matrix, and compute the rank of that. 


Definition If A is a matrix, then the rank of A, denoted rank(A), is 
defined by rank(A) = rank(L,). 


Example Consider the matrix 


100 0 

010 0 

ae NS ie as (0 

00 0 0 

Then Ly is the transformation 

X41 1000 Ly xy 
7 2 ing 01 0 0 2 eS Hop) 
Mt ap 00 “io rs r3 
LA 00 0 0 LA 0 


134 


The range of this operator is thus three-dimensional (why?) and so the 
rank of A is 3. 


Let A be anmxn matrix, so that D4 maps R” toR™. Let (e1, €2,..., en) 

be the standard basis for R”. Since e),...,e, span R”, we see that 

La(ex), La(eo),..., La(en) span R(L,4) (see Theorem 3 of Week 3 notes). 

Thus the rank of A is the dimension of the space spanned by L.4(e1), La(e2),..., La(en). 
But La(e1) is simply the first column of A (why?), L4(e2) is the second 

column of A, etc. Thus we have shown 


Lemma 2. The rank of a matrix A is equal to the dimension of the 
space spanned by its columns. 


Example If A is the matrix used in the previous example, then the 


i) 0 
: : : 0 i 
rank of A is the dimension of the span of the columns ob lol 

0 0 

0 0 

0 0 : : ' : : 

rielo | this span is easily seen to be three-dimensional. 

0 0 


As one corollary of this Lemma, if only k of the columns of a matrix 
are non-zero, then the rank of the matrix can be at most k (though it 
could be smaller than k; can you think of an example?). 


This Lemma does not necessarily make computing the rank easy, be- 
cause finding the dimension of the space spanned by the columns could 
be difficult. However, one can use elementary row or column operations 
to simplify things. From Lemma 1 we see that if EF is an elementary 
matrix and A is a matrix, then EA or AE has the same rank as A 
(provided that the matrix multiplication makes sense, of course). Thus 
elementary row or column operations do not change the rank. Thus, we 
can use these operations to simplify the matrix into row-echelon form. 


Lemma 3. Let A be a matrix in row-echelon form (thus every row is 
either zero, or begins with a 1, and each of those 1s has nothing but 0s 
below it). Then the rank of A is equal to the number of non-zero rows. 


135 


e Proof. Let’s say that A has k non-zero rows, so that we have to show 
that A has rank k. Each column of A can only has the top k entries 
non-zero; all entries below that must be zero. Thus the span of the 
columns of A is contained in the k-dimensional space 


and so the rank is at most k. 


e Now we have to show that the rank is at least k. To do this it will 
suffice to show that every vector in V is in the span of the columns of 
A, since this will mean that the span of the columns of A is exactly 


the k-dimensional space V. So, let us pick a vector v := in 


V. Since A is in row-echelon form, the k“” row of A must contain a 1 
somewhere, which means that there is a column whose k‘” entry is 1 
(with all entries below that equal to 0). If we subtract x, multiples of 
this column from v, then we get a new vector whose k‘” entry (and all 
the ones below it) are zero. 


e Now we look at the (k — 1) row. Again, since we are in row-echelon 
form, there is a 1 somewhere, with Os below it; this implies that there is 
a column whose (k — 1) entry is 1 (with all entries below that equal to 
0). Thus we can subtract a multiple of this vector to get a new vector 
whose (k — 1)" entry (and all the ones below it) are zero. 


e Continuing in this fashion we can subtract multiples of various columns 
of A from v until we manage to zero out all the entries. In other words, 
we have expressed v as a linear combination of columns in A, and hence 


136 


v is in the span of the columns. Thus the span of the columns is exactly 
V, and we are done. 


Thus we now have a procedure to compute the rank of a matrix: we 
row reduce (or column reduce) until we reach row-echelon form, and 
then just count the number of non-zero rows. (Actually, one doesn’t 
necessarily have to reduce all the way to row-echelon form; it may be 
that the rank becomes obvious some time before then, because the span 
of the columns can be determined by inspection). 


If one only uses elementary row operations, then usually one cannot 
hope to make the matrix much simpler than row-echelon form. But if 
one is allowed to also use elementary column operations, then one can 
get the matrix into a particularly simple form: 


Theorem 4. Let A be an m x n matrix with rank r. Then one can 
use elementary row and column matrices to place A in the form 


ie Osten 
ine pee eae j 
where I, is the r x r identity matrix, and Omxn is the m xn zero matrix. 


(Thus, we have reduced the matrix to nothing but a string of r ones 
down the diagonal, with everything else being zero. 


Proof To begin with, we know that we can use elementary row opera- 
tions to place A in row-echelon form. Thus the first r rows begin with 
a 1, with Os below the 1, while the remaining m — r rows are entirely 
zero. 


Now consider the first row of this reduced matrix; let’s suppose that it is 
not identically zero. After some zeroes, it has a 1, and then some other 
entries which may or may not be zero. But by subtracting multiples of 
the column with 1 in it from the columns with other non-zero entries 
(i.e. type 3 column operations), we can make all the other entries in 
this row equal to zero. Note that these elementary column operations 
only affect the top row, leaving the other rows unchanged, because the 
column with 1 in it has 0s everywhere else. 


137 


e A similar argument then allows one to take the second row of the 
reduced matrix and make all the entries (apart from the leading 1) 
equal to 0. And so on and so forth. At the end of this procedure, we 
get that the first r rows each contain one 1 and everything else being 
zero. Furthermore, these 1s have 0s both above and below, so they lie 
in different columns. Thus by switching columns appropriately (using 
type 1 column operations) we can get into the form required for the 
Theorem. 


e Let A be an m X n matrix with rank r. Every elementary row opera- 
tion corresponds to multiplying A on the left by an m x m elementary 
matrix, while every elementary column operation corresponds to multi- 
plying A on the right by an n x n elementary matrix. Thus by Theorem 
4, we have 


I, 0,. n-T 
Bi Ba.. EuAPiFs... = ( ‘ Ni 
aaa eae ee 
where F),..., £, are elementary m x m matrices, and F)F,...F, are 
elementary n xX n matrices. All the elementary matrices are invertible. 
After some matrix algebra, this becomes 


iB Opener 


Oni aee One 


AS Bo. Bo Bee ( ) | ae ae Se 


(why did the order of the matrices get reversed’). We have thus proven 


e Proposition 5. Let A be an m x n matrix with rank r. Then we have 
an m X m matrix B which is a product of elementary matrices and an 
n Xn matrix C’, also a product of elementary matrices such that 


A = B ( I, Orxn—r ) C. 


Ome ame 


e Note (from Q6 of Assignment 4) that the product of invertible matrices 
is always invertible. Thus the matrices B and C' above are invertible. 


e Proposition 5 is an example of a factorization theorem, which takes a 
general matrix and factors it into simpler pieces. There are many other 
examples of factorization theorems which you will encounter later in 
the 115 sequence, and they have many applications. 


138 


e Some more properties of rank. We know from Lemma 1 that rank of a 
linear transformation is unchanged by multiplying on the left or right 
by invertible transformations. Given the close relationship between 
linear transformations and matrices, it is unsurprising that the same 
thing is true for matrices: 


e Lemma 6. Let A be an m x n matrix, B be an m x m invertible 
matrix, and C’ be an n X n invertible matrix. Then 


rank(A) = rank(BA) = rank(AC) = rank(BAC). 


e Proof. Since B is invertible, so is Lp (see Theorem 10 from Week 4 
notes). Similarly Lc is invertible. From Lemma 1 we have 


rank(£4) = rank(ZgLl,) = rank(L4Llc) = rank(LeLlaLlc). 
Since Lgl, = Lega, etc. we thus have 


rank(Z4) = rank(£g,) = rank(Lyc) = rank(Legac). 


The claim then follows from the definition of rank for matrices. 


e Note that Lemma 6 is consistent with Proposition 5, since the matrix 
( I, Opie 
has rank r. 


Oar Oni 


e One important consequence of the above theory concerns the rank of 
a transpose A‘ of a matrix A. Recall that the transpose of an m x n 
matrix is the n x m matrix obtained by reflecting around the diagonal, 


so for instance ; 


oo fe SB ek 

i ( 246 8 ) ‘ 

Thus transposes swap rows and columns. From the definition of matrix 
multiplication it is easy to verify the identity (AB)! = B'A' (why?). In 
particular, if A is invertible, then J = I‘ = (AA7')* = A‘(A7')*, which 
implies that A’ is also invertible. 


OW eR 
a Ww 


“I 
CO 


139 


Lemma 7 Let A be an m X n matrix with rank r. Then A‘ has the 
same rank as A. 


From Proposition 5 we have 
woe I, Opewar 
: 7 2 ( (aeeee | ea ee ) cs 
Taking transposes of both sides we obtain 
I Orci 
t t r rxm—-r t 
fe 7 ( Ones Op ters Se ) i ; 


The inner matrix on the right-hand side has rank r. Since B and C are 
invertible, so are B‘ and C*, and so by Lemma 6 At has rank r, and we 
are done. 


From Lemma 7 and Lemma 2 we thus have 


Corollary 8. The rank of a matrix is equal to the dimension of the 
space spanned by its rows. 


As one corollary of this Lemma, if only k of the rows of a matrix are 
non-zero, then the rank of the matrix can be at most k. 


Finally, we give a way to compute the rank of any linear transformation 
from one finite-dimensional space to another. 


Lemma 9. Let T': V > W be alinear transformation, and let 3 and y 
be finite bases for V and W respectively. Then rank(T) = rank([T]}3). 


Proof. Suppose V is n-dimensional and W is m-dimensional. Then 
the co-ordinate map ¢g : V — R” defined by ¢g(v) := [uv]? is an 
isomorphism, as is the co-ordinate map ¢, : W — R"”™ defined by 
dy(w) := [w]7. Meanwhile, the map Ling is a linear transformation 
from R” to R™. The identity 


[Tv]? = [T]ple]? 


can thus be rewritten as 


and thus 
eT = Lin op 


and hence (since ¢g is invertible) 
Taking rank of both sides and using Lemma 6, we obtain 


rank(T’) = rank (Ler) = rank([T]}) 


as desired. 


e Example Let 7 : P3(R) — P3(R) be the linear transformation 
Ths fax" 


thus for instance 
Tx’ =x —2(22)=—z2". 


To find the rank of this operator, we let 8 := (1,2,2?,«°) be the 
standard basis for P3(R). A simple calculation shows that 


Tl=1; Tx=0; Tx? =-2?; Tax? = -22°, 


sO 


coo CO KF 


O20” 2 
This matrix clearly has rank 3 (row operations can convert it to row- 


echelon form with three non-zero rows), so T’ has rank 3. 


e The rank of a matrix measures, in some sense, how close to zero (or 
how “degenerate”) a matrix is; the only matrix with rank 0 is the 0 
matrix (why). The largest rank an m x n matrix can have is min(m, n) 
(why? See Lemma 2 and Corollary 8). For instance, a 3 x 5 matrix can 
have rank at most 3. 


KOK OK OK Ok 


Inverting matrices via row operations 


141 


Proposition 5 has the following consequence. 


Lemma 10. Let A be an n X n matrix. Then A is invertible if and 
only if it is the product of elementary n x n matrices. 


Proof First suppose that A is the product of elementary matrices. We 
already know that every elementary matrix is invertible; also from Q6 
from the Week 4 homework, we know that the product of two invertible 
matrices is also invertible. Applying this fact repeatedly we see that A 
is also invertible. 


Now suppose that A is invertible, thus D4: R"” — R” is invertible, and 
in particular is onto. Thus R(L,4) = R”, and so rank(L,4) =n, and so 
A itself must have rank n. By Proposition 5 we thus have 


A=E,E,...Egl,F,... Fy 


where E),..., Fg, F\,...,F, are elementary n x n matrices. Since the 
identity matrix J, cancels out, we are done. 


This gives us a way to use row operations to invert a matrix A. Suppose 
we manage to use a sequence of row operations £,, Eo,..., fH, in turn 
a matrix A into the identity, thus 


E,... Fok A= I. 
Then by multiplying both sides on the right by A~! we get 
Ey... Eakyl = AT. 


Thus, if we concatenate A and J together, and apply row operations on 
the concatenated matrix to turn the A component into J, then the I 
component will automatically turn to A~!. This is a way of computing 
the inverse of A. 


Example. Suppose we want to invert the matrix 
2 
>), 


142 


We combine A and the identity J into a single matrix: 


auy=(4 3] 5 2). 


Then we row reduce to turn the left matrix into the identity. For 
instance, by subtracting three copies of row 1 from row 2 we obtain 


Cae 


and then by adding row 2 to row 1 we obtain 


Cael ee) 


Dividing the second row by —2 we obtain 


Gear. 


This the inverse of A is 


mee ( a a ) 


since the elementary transformations which convert A to I, also con- 
vert I, to AT}. 


OK OK OK 


Determinants 


e We now review a very useful characteristic of matrices - the determi- 
nant of a matrix. The determinant of a square (n x n) matrix is a 
number which depends in a complicated way on the entries of that 
matrix. Despite the complicated definition, it has some very remark- 
able properties, especially with regard to matrix multiplication, and 
row and column operations. Unfortunately we will not be able to give 
the proofs of many of these remarkable properties here; the best way 
to understand determinants is by means of something called exterior 


143 


algebra, which is beyond the scope of this course. Without the tools 
of exterior algebra (in particular, something called a wedge product), 
proving any of the fundamental properties of determinants becomes 
very messy. So we will settle just for describing the determinant and 
stating its basic properties. 


The determinant of a 1 x 1 matrix is just its entry: 


det(a) = a. 


The 2 x 2 determinant is given by the formula 


a b 
det ( ¢ a) := ad — be. 


The n xn determinant is messier, and is defined in the following strange 
way. For any row 7 and column j, define the minor Ay of ann xn 
matrix A to be the n — 1 x n — 1 matrix which is A with the i” row 
and 7" column removed. For instance, if 


abe 
A= ("a e@ of 
OG RA 


then 
E z d ‘ d 
du=(5 i), An=($ 1), hn= (4%) 
etc. 


This should not be confused with A;;, which is the entry of A in the 
i” row and 3" column. FOr instance, in the above example Aj, = a 


and Aj = b. 


We can now define the n x n determine recursively in terms of the 
n—1 determinant by what is called the cofactor expansion. To find the 
determinant of an n x n matrix A, we pick a row i (any row will do) 


and set é 


det(A) = 5° (-1)'7 Aj; det(Aj,). 


j=l 


For instance, we have 


ab <€ 
det(| de f =adet (5 f )-bact ( * f ) react ( ¢ a) 
ee hi OQ g h 
or 
a € h 
det(| de f = -deet ( * )redet (¢ ; )-fact (* ) 
a hea h 
or 
ae b ¢ Ge a b 
det ( : : if = oder (2 + )—haet ( § + )riaet ( § io 


It seems that this definition depends on which row you use to perform 
the cofactor expansion, but the amazing thing is that it doesn’t! For 
instance, in the above example, any three of the computations will lead 
to the same answer, namely 


aei — ahf — bdi + bg f + cdh — cge. 


We would like to explain why it doesn’t matter which row (or column; 
see below) to perform cofactor expansion, but it would require one to 
develop some new material (on the signature of permutations) which 
would take us far afield, so we will regretfully skip the derivation. 


The quantities det(A”) are sometimes known as cofactors. As one 
can imagine, this cofactor formula becomes extremely messy for large 
matrices (to compute the determinant of an n x n matrix, the above 
formula will eventually require us to add or subtract n! terms together!); 
there are easier ways to compute using row and column operations 
which we will describe below. 


One can also perform cofactor expansion along a column j instead of 


a row 2: 
n 


det (A) — So(-1)7 Ay det (A;;). 


i=1 


145 


This ultimately has to do with a symmetry property det(A‘) = det(A) 
for the determinant, although this symmetry is far from obvious given 
the definition. 


A special case of cofactor expansion: if A has the form 


c 0...0 
A=|{. , 


where the 0...0 are a string of n — 1 zeroes, the : represent a column 
vector of length n—1, and B is ann —1 xn —1 matrix, then det(A) = 
cdet(B). In particular, from this and induction we see that the identity 
matrix always has determinant 1: det(J,,) = 1. Also, we see that 
the determinant of a lower-diagonal matrix is just the product of the 
diagonal entries; for instance 


Because of the symmetry property we also know that upper-diagonal 
matrices work the same way: 


abe 
det(| O e f | =aei. 
0 0 2 


So in particular, diagonal matrices have a determinant which is just 
multiplication along the diagonal: 


a 0 0 
det({ 0 e O | =aei. 
0 0 2 


Ann X n matrix can be thought of as a collection of n row vectors in 
R”, or as a collection of n column vectors in R”. Thus one can talk 
about the determinant det(v,,...,v,) of m column vectors in R”, or 
the determinant of n row vectors in R”, simply by arranging those n 
row or column vectors into a matrix. Note that the order in which we 
arrange these vectors will be somewhat important. 


146 


e Example: the determinant of the two vectors ( ; ) and ( : ) is 
ad — be. 


e The determinant of n vectors has two basic properties. One is that it 
is linear in each column separately. What we mean by this is that 


det (v1, vee Uj—-1, Uj + W 5, Uj41,--- 0) = 
det (v1, vee Uj—-1, Uj, Uj4is ++: Wi) + det (v1, vee Uj—-1, Wj, Uj41) ++ 0) 
and 
Cetin vey Peay CU Ue ese) = OUCH Oi. 5p Ms OF Uae ns 14 Uy) 


This linearity can be seen most easily by cofactor expansion in the 
column j. 


e The other basic property it has is anti-symmetry: if one switches 
two column vectors (not necessarily adjacent), then the determinant 
changes sign. For instance, when n = 5, 


det(v1, Us, U3, V4, V2) = — det (v1, v2, U3, U4, U5). 


This is not completely obvious from the cofactor expansion definition, 
although the presence of the factor (—1)’*? does suggest that some sort 
of sign change might occur when one switches rows or columns. We 
will not prove this anti-symmetry property here. 


e (It turns out that the determinant is in fact the only expression which 
obeys these two properties, and which also has the property that the 
identity matrix has determinant one. But we will not prove that here). 


e The same facts hold if we replace columns by rows; i.e. the determinant 
is linear in each of the rows separately, and if one switches two rows 
then the determinant changes sign. 


e We now write down some properties relating to how determinants be- 
have under elementary row operations. 


147 


Property 1 If A is an n x n matrix, and B is the matrix A with two 
distinct rows i and 7 interchanged, then det(B) = — det(A). (le. row 
operations of the first type flip the sign of the determinant). This is 
just a restatement of the antisymmetry property for rows. 


c ad a b 
det ( § |= mast ( ¢ ae 


Corollary of Property 1: If two rows of a matrix A are the same, then 
the determinant must be zero. 


Example: 


Property 2 If A is an n x n matrix, and B is the matrix B but with 
one row 7 multiplied by a scalar c, then det(B) = cdet(A). (Le. row 
operations of the second type multiply the determinant by whatever 
scalar was used in the row operation). This is a special case of the 
linearity property for the i‘” row. 


ka kb a b 
det (| y )=rast($ 5 |, 


Corollary of Property 2: if a matrix A has one of its rows equal to zero, 
then det(A) is zero (just apply this Property with c = 0). 


Example: 


Property 3 If A is an n x n matrix, and B is the matrix B but with c 
copies of one row 7 added to another row j, then det(B) = det(A). (Le. 
row operations of the third type do not affect the determinant). This is 
a consequence of the linearity property for the j*” row, combined with 
the Corollary to Property 1 (why?). 


atke b+kd\ _ a b 
det (¢ d ) =aee(! or 


Similar properties hold for elementary column operations (just replace 
“row” by “column” throughout in the above three properties). 


Example: 


We can summarize the above three properties in the following lemma 
(which will soon be superceded by a more general statement): 


148 


e Lemma 11. If F is an elementary matrix, then det(#.A) = det(E) det(A) 
and det(AE) = det(A) det(£). 


e This is because the determinant of a type 1 elementary matrix is easily 
seen to be -1 (from Property 1 applied to the identity matrix), the 
determinant of a type 2 elementary matrix (multiplying a row by c) is 
easily seen to be c (from Property 2 applied to the identity matrix), 
and the determinant of a type 3 elementary matrix is easily seen to 
be 1 (from Property 3 applied to the identity matrix). In particular, 
elementary matrices always have non-zero determinant (recall that in 
the type 2 case, c must be non-zero). 


e We are now ready to state one of the most important properties of a 
determinant: it measures how invertible a matrix is. 


e Theorem 12. An n x n matrix is invertible if and only if its determi- 
nant is non-zero. 


e Proof Suppose A is an invertible nxn matrix. Then by Lemma 10, it is 
the product of elementary matrices, times the identity [,. The identity 
I, has a non-zero determinant: det(J/,,) = 1. Each elementary matrix 
has non-zero determinant (see above), so by Lemma 11 if a matrix has 
non-zero determinant, then after multiplying on the left or right by 
an elementary matrix it still has non-zero determinant. Applying this 
repeatedly we see that A must have non-zero determinant. 


Now conversely suppose that A had non-zero determinant. By Lemma 
11, we thus see that even after applying elementary row and column 
operations to A, one must still obtain a matrix with non-zero deter- 
minant. In particular, in row-echelon form A must still have non-zero 
determinant, which means in particular that it cannot contain any rows 
which are entirely zero. Thus A has full rank n, which means that 
La: R”" > R” is onto. But then L4 would also be one-to-one by the 
dimension theorem - see Lemma 2 of Week 3 notes, hence L.4 would be 
invertible and hence A is invertible. 


e Not only does the determinant measure invertibility, it also measures 
linear independence. 


149 


Corollary 13. Let w,...,v¥n, be n column vectors in R”. Then 
U1,--+,Un are linearly dependent if and only if det(v1,...,u,) = 0. 


Proof Suppose that det(v1,...,0n) = 0, so that by Theorem 12 the 
n Xn matrix (v1,...,Un) is not invertible. Then the linear transforma- 
tion L(y,,....v,) Cannot be one-to-one, and so there is a non-zero vector 


ay 
a2 F r 
in the null space, i.e. 
an 
ay 
a2 
(Gis xb Ue) =0 
An 
or in other words 
QV, +... + AnUn = 0, 
i.€. Uj,..-,Un are not linearly independent. The converse statement 


follows by reversing all the above steps and is left to the reader. 


Note that if one writes down a typical n x n matrix, then the determi- 
nant will in general just be some random number and will usually not 
be zero. So “most” matrices are invertible, and “most” collections of n 
vectors in R” are linearly independent (and hence form a basis for R”, 
since R” is n-dimensional). 


Properties 1,2,3 also give a way to compute the determinant of a ma- 
trix - use row and column operations to convert it into some sort of 
triangular or diagonal form, for which the determinant is easy to com- 
pute, and then work backwards to recover the original determinant of 
the matrix. 


Example. Suppose we wish to compute the determinant of 
1 2 3 
A | 2, oh 
3.2 1 


150 


We perform row operations. Subtracting two copies of row 1 from row 
2 and using Property 3, we obtain 


LO 2A3 
det(A) =det {| 0 0 —2 
3 2 1 


Similarly subtracting three copies of row 1 from row 2, we obtain 


Dividing the third row by —1/4 using Property 2, we obtain 


4 Th ws 
a det(A) =det {| 0 0 —2 
Oi ul: 2 


which after swapping two rows using Property 1, becomes 
l i ee ee 
Z det(A) =det | 0 1 2 

0 0 -2 


But the right-hand side is triangular and has a determinant of —2. 
Hence } det(A) = —2, so that det(A) = —8. (One can check this using 
the original formula for determinant. Which approach is less work? 
Which approach is less prone to arithmetical error?) 


We now give another important property of a determinant, namely its 
multiplicative properties. 


Theorem 14. If A and B are n x n matrices, then det(AB) = 
det (A) det(B). 


Proof First suppose that A is not invertible. Then Ly, is not onto 
(cf. Lemma 2 of Week 3 notes), which implies that L4Lz is not onto 
(why? Note that the range of L4L, must be contained in the range of 
L,), so that AB is not invertible. Then by Theorem 12, both sides of 


151 


det(AB) = det(A) det(B) are zero, and we are done. Similarly, suppose 
that B is not invertible. Then Lz is not one-to-one, and so L4Lz is not 
one-to-one (why? Note that the null space of Lz must be contained in 
the null space of L4Lg). So AB is not invertible. Thus both sides are 
again zero. 


The only remaining case is when A and B are both invertible. By 
Lemma, 10 we may thus write 


A= Ei Ey... Ey; B=F,F)...F 


where E),..., Ea, Fi,...,F, are elementary matrices. By many appli- 
cations of Lemma 11 we thus have 


det(A) = det(£1) det(F2) ... det(Ez) 


and 
det(B) = det(F) det(F2)...det(F;). 


But also 
AB=E,...E4F,...F, 


and so by taking det of both sides and using Lemma 11 many times 
again we obtain 


det(AB) = det(£,)...det(E4) det(Fi)...det(Fy) 


and by combining all these equations we obtain det(AB) = det(A) det(B) 
as desired. 


Warning: The corresponding statement for addition is not true in 
general, i.e. det(A + B) # det(A) + det(B) in general. (Can you think 
up a counterexample? Even for diagonal matrices one can see this will 
not work. On the other hand, we still have linearity in each row and 
column). 


Note that Theorem 14 supercedes Lemma 11, although we needed 
Lemma 11 as an intermediate step to prove Theorems 12 and 14. 


152 


e (Optional) Remember the symmetry property det(A’) = det(A) we 
stated earlier? It can now be proved using the above machinery. We 
sketch a proof as follows. First of all, if A is non-invertible, then A‘ 
is also non-invertible (why?), and so both sides are zero. Now if A is 
invertible, then by Lemma 10 it is the product of elementary matrices: 


A= Ey, Eo... KE, 


and so 
det(A) = det(£,)...det(E,). 


On the other hand, taking transposes (and recalling that transpose 
reverses multiplication order) we obtain 


At =Re.. BAE 


and so 
det(A‘) = det(E%)...det(E%). 


But a direct computation (checking the three types of elementary ma- 
trix separately) shows that det(E’) = det(#) for every elementary 
matrix, so 


det(A*) = det(E,)...det(E}). 
Thus det(A‘) = det(A) as desired. 


KOK OK OK Ok 


Geometric interpretation of determinants (optional) 


e This material is optional, and is also not covered in full detail. It 
is intended only for those of you who are interested in the geometric 
provenance of determinants. 


e Up until now we’ve treated the determinant as a mysterious algebraic 
expression which has a lot of remarkable properties. But we haven’t 
delved much into what the determinant actually means, and why we 
have any right to have such a remarkable characteristic of matrices. It 
turns out that the determinant measures something very fundamental 
to the geometry of R”, namely n-dimensional volume. The one caveat is 
that determinants can be either positive or negative, while volume can 


153 


only be positive, so determinants are in fact measuring signed volume - 
volume with a sign. (This is similar to how a definite integral i f(x) dx 
can be negative if f dips below the x axis, even though the “area under 
the curve” interpretation of f(x) seems to suggest that integrals must 
always be positive). 


Let’s begin with R!. The determinant det(v,) of asingle vector v, = (a) 
in R! is of course a, which is plus or minus the length |a| of that vector; 
plus if the vector is pointing right, and minus if the vector is pointing 
left. In the degenerate case v; = 0, the determinant is of course zero. 


Now let’s look at R?, and think about the determinant det(v,, v2) of 
two vectors v1, V2 in R?. This turns out to be (plus or minus) the area of 
the parallelogram with sides v; and v9; plus if va is anticlockwise of v, 
and minus if v2 is clockwise of v,. For instance, det((1, 0), (0,1)) is the 
area of the square with sides (1,0), (0,1), i.e. 1. On the other hand, 
det((0, 1), (1,0)) is -1 because (0,1) is clockwise of (1,0). Similarly, 
det((3, 0), (0, 1)) is 3, because the rectangle with sides (3, 0), (0,1) has 
area 3, and det ((3, 1), (0, 1)) is also 3, because the parallelogram with 
sides (3, 1), (0, 1) has the same area as the previous rectangle. 


This parallelogram property can be proven using cross products (recall 
that the cross product can be used to measure the area of a parallel- 
ogram). It is also interesting to interpret Properties 1, 2, 3 using this 
area interpretation. Property 1 says that if you swap the two vectors 
vy, and vo, then the sign of the determinant changes. Property 2 says 
that if you dilate one of the vectors by c, then the area of the parallelo- 
gram also dilates by c (note that if c is negative, then the determinant 
changes sign, even though the area is of course always positive, because 
you flip the clockwiseness of v; and v2). Property 3 says that if you 
slide v2 (say) by a constant multiple of v,, then the area of the par- 
allelogram doesn’t change. (This is consistent with the familiar “base 
x height” formula for parallelograms - sliding v2 by v, does not affect 
either the base or the height). 


Note also that if vy and v2 are linearly dependent, then their parallel- 
ogram has area 0; this is consistent with Corollary 13. 


154 


e Now let’s look at R®, and think about the determinant det(v1, ve, v3) 
of three vectors in R®. Thus turns out to be (plus or minus) the 
volume of the parallelopiped with sides v1, v2,v3 (you may remember 
this from Math 32A). To determine plus or minus, one uses the right- 
hand rule: if the thumb is at v,; and the second finger is at vo, and 
the middle finger is at v3, then we have a plus sign if this can be 
achieved using the right hand, and a minus sign if it can be achieved 
using the left-hand. For instance, det((1,0,0), (0,1,0),(0,0,1)) = 1, 
but det((0, 1,0), (1,0,0), (0,0,1)) = —1. It is an instructive exercise 
to interpret Properties 1,2,3 using this geometric picture, as well as 
Corollary 13. 


e The two-dimensional rule of “determinant is positive if vg clockwise of 
vy” can be interpreted as a right-hand rule using a two-dimensional 
hand, while the one-dimensional rule of “determinant is positive if v; 
is on the right” can be interpreted as a right-hand rule using a one- 
dimensional hand. 


e There is a similar link between determinant and n-dimensional volume 
in higher dimensions n > 3, but it is of course much more difficult to vi- 
sualize, and beyond the scope of this course (one needs some grounding 
in measure theory, anyway, in order to understand what “n-dimensional 
volume” means. Also, one needs n-dimensional hands.). But in partic- 
ular, we see that the volume of a parallelopiped with edges v,...,Un 
is the absolute value of the determinant det(v1,...,Un,). (Note that it 
doesn’t particularly matter whether we use row or column vectors here 


since det(A‘) = det(A)). 


e Let v1,...,U, ben column vectors in R”, so that (v1,...,U,) isannxn 
matrix, and consider the linear transformation 


T:= Hoey sie 


Observe that 


155 


T(0,0,...,1) =n 


(why is this the case?) So if we let Q be the unit cube with edges 
(1,0,...,0),...,(0,0,...,1), then T will map Q to the n-dimensional 
parallelopiped with vectors v1,...,Un- (If you are having difficulty 
imagining n-dimensional parallelopipeds, you may just want to think 
about the n = 3 case). Thus T(Q) has volume | det(v1,...,vn)|, while 
Q of course had volume 1. Thus JT’ expands volume by a factor of 
| det(v1,...,Un)|. Thus the magnitude | det(A)| of a determinant mea- 
sures how much the linear transformation D4 expands (n-dimensional) 
volume. 


Example Consider the matrix 


which as we know has the corresponding linear transformation 
La(a1, X2) = (52, 322). 


This dilates the x; co-ordinate by 5 and the x2 co-ordinate by 3, so area 
(which is 2-dimensional volume) is expanded by 15. This is consistent 
with det(A) = 15. Note that if we replace 3 with -3, then the determi- 
nant becomes -15 but area still expands by a factor of 15 (why?). Also, 
if we replace 3 instead with 0, then the determinant becomes 0. What 
happens to the area in this case? 


Example Now consider the matrix 


which as we know has the corresponding linear transformation 
La(x1, X2) = (xy + v9, Xo). 


This matrix shears the x; co-ordinate horizontally by an amount de- 
pending on the x2 co-ordinate, but area is unchanged (why? It has to 
do with the base x height formula for area). This is consistent with 
the determinant being 1. 


156 


e So the magnitude of the determinant measures the volume-expanding 
properties of a linear transformation. The sign of a determinant will 
measure the orientation-preserving properties of a transformation: will 
a “right-handed” object remain right-handed when one applies the 
transformation? If so, the determinant is positive; if however right- 
handed objects become left-handed, then the determinant is negative. 


e Example The reflection matrix 


corresponds to reflection through the x1-axis: 
La(@1,%2) = (21, —£2). 


It is clear that a “right-handed” object (which in two-dimensions, means 
an arrow pointing anti-clockwise) will reflect to a “left-handed” object 
(an arrow pointing clockwise). This is why reflections have negative 
determinant. 


e This interpretation of determinant, as measuring both the volume ex- 
panding and the orientation preserving properties of a transformation, 
also allow us to interpret Theorem 14 geometrically. For instance, if 
T : R” > R” expands volume by a factor of 4 and flips the orien- 
tation (so det(T i = —4, where { is the standard ordered basis), and 
S :R” - R” expands volume by a factor of 3 and also flips the orien- 
tation (so det [Sf = —3), then one can now see why ST’ should expand 


volume by 12 and preserve orientation (so det|ST le = +12). 


e We now close with a little lemma that says that to take the determinant 
of a matrix, it doesn’t matter what basis you use. 


e Lemma 15. If two matrices are similar, then they have the same 
determinant. 


e Proof If A is similar to B, then B = Q7!AQ for some invertible matrix 
Q. Thus by Theorem 13 


det(B) = det(Q~*) det(A) det(Q) = det(A) det(Q~*) det(Q) 


157 


= det(A) det(Q~'Q) = det(A) det(J,) = det(A) 


as desired. 


e Corollary 16. Let 7 : R” > R” be a linear transformation, and let 
6, 2’ be two ordered bases for R”. Then the matrices [T i and [T ie 
have the same determinant. 


e Proof. From last week’s notes we know that [T iF and |T are similar, 
and the result follows from Lemma 15. 


158 


Math 115A - Week 7 
Textbook sections: 4.5, 5.1-5.2 
Topics covered: 


Cramer’s rule 

Diagonal matrices 
Eigenvalues and eigenvectors 
Diagonalization 


OK OK OK 


Cramer’s rule 


Let A be an n x n matrix. Last week we introduced the notion of the 
determinant det(A) of A, and also that of a cofactor Ay associated to 
each row 7 and column j. Given any row 7, we then have the cofactor 
expansion formula 


j=l 
For instance, if 
abc 
A=|de f 
GA hes 


then - , - 
det (A) = aAy = bAj2 + cA13 


or in other words 


abe 
det| d e f | =adet I —bdet 4 I +c det oe . 
ne a ae) go 4 g h 


159 


e Suppose we replace the row (a,b,c) by (d,e, f) in the above example. 
Then we have 


def 
det| d e f = deer (5 } )-edee ( ¢ f )+fact (¢ a. 
a i o 4 g h 


But the left-hand side is zero because two of the rows are the same 
(see Property 1 of determinants on the previous week’s notes). Thus 
we have 


0= dAy = eAry =f fA. 


Similarly we have 7 2 
0= gAu = hAjo + 1A43. 


We can also do the same analysis with the cofactor expansion along 
the second row 


det (A) = —dAg; + eAge — f Ags 


yielding - 7 7 
0= —aAg + bAg22 = CA93 
0 = —gAo; + hAg2 — iAg3. 


And similarly for the third row: 
det (A) = GAs = hAss + iAg3 
0= aAs aa bAs3 + cAs3 
0= dAs, = éAss + fAsy 


We can put all these nine identies together in a compact matrix form 
as 


a b Cc +A —Apy +As31 det (A) 0 0 
@ ves F —Aj2 Aas Ags =| 0 det(A) 0 
g hia Ags “Ags. sags 0 0 det (A) 


160 


The second matrix on the left-hand side is known as the adjugate of A, 
and is denoted adj(A): 


Au —Ay, Asi 
adj(A) := = Ais Ag2 — Age 
Ay3z  —Ag3 A33 
Thus we have the identity for 3 x 3 matrices 
Aadj(A) = det(A)IJs. 
The adjugate matrix is the transpose of the cofactor matrix cof (A): 
Au —A1p Ais 
cof (A) := Agi Ax: Ags 
Asi —A32_ A33 
Thus adj(A) = cof(A)’. To compute the cofactor matrix, at every row 
i and column j we extract the minor corresponding to that row and 
column, take the n—1xn—1 determinant of the minor, and then place 


that number in the 77 entry of the cofactor matrix. Then we alternate 
the signs by (—1)’*/. 


More generally, for n x n matrices, we can define the cofactor matrix 
by “api 
cof (A)ig = (1) Aig 
and the adjugate matrix by adj(A) = cof(A)‘, so that 
adj(A)ig = (-1) Aj, 
and then we have the identity 
Aadj(A) = det(A)In. 
If det(A) is non-zero, then A is invertible and we thus have 
A= ia adj(A). 


This is known as Cramer’s rule - it allows us to compute the inverse of 
a matrix using determinants. (There is also a closely related rule, also 
known as Cramer’s rule, which allows one to solve equations Ax = b 
when A is invertible; see the textbook). 


161 


e For example, in the 2 x 2 case 


then the cofactor matrix is 


cof(A) = ( lee ) 


and so the adjugate matrix is 


adi(A) = oofay'=(4, 2” ) 


—c a 


and so, if det(A) ¥ 0, the inverse of A is 


Sth a 1 : F 1 d —c 
Oa ate ede. 3) 


OK OK OK 


Diagonal matrices 


e Matrices in general are complicated objects to manipulate, and we are 
always looking for ways to simplify them into something better. Last 
week we explored one such way to do so: using elementary row (or 
column operations) to reduce a matrix into row-echelon form, or to 
even simpler forms. This type of simplification is good for certain 
purposes (computing rank, determinant, inverse), but is not good for 
other purposes. For instance, suppose you want to raise a matrix A to 
a large power, say A!°°, Using elementary matrices to reduce A to, say, 
row-echelon form will not be very helpful, because (a) it is still not very 
easy to raise row-echelon form matrices to very large powers, and (b) 
one has to somehow deal with all the elementary matrices you used to 
convert A into row echelon form. However, to perform tasks like this 
there is a better factorization available, known as diagonalization. But 
before we do this, we first digress on diagonal matrices. 


162 


e Definition An n x n matrix A is said to be diagonal if all the off- 
diagonal entries are zero, i.e. Aj; = 0 whenever 7 4 j. Equivalently, a 
diagonal matrix is of the form 


Ay, 0 0 
i 0 Asp 0 
0 0 io MAR 
We write this matrix as diag(Aj1, Ag,..., Ann). Thus for instance 
1 0 0 
diag(1,3,5)= | 0 3 0 
0 0 5 


e Diagonal matrices are very easy to add, scalar multiply, and multiply. 
One can easily verify that 


diag (a1, ag,.--, a) t+diag(b, by,..., by) = diag(ay+b1, agt+be,...,a ,t+bn), 


cdiag(a;, a2,..., an) = diag(cay, cag,..., Can) 


and 
diag(ay, a2,..., an)diag(bj, be,..., bn) = diag(ayby, agbe,..., anbn). 
Thus for instance 
diag(1,3, 5)diag(1,3,5) = diag(1?, 3°, 5?) 
and more generally 
ding (1,3, 5)" =diag(1":3",5"): 


Thus raising diagonal matrices to high powers is very easy. More gen- 
erally, polynomial expressions of a diagonal matrix are very easy to 
compute. For instance, consider the polynomial f(x) = x? + 4x? + 2. 
We can apply this polynomial to any n x n matrix A, creating the new 
matrix f(A) = A?+4A?+42. In general, such a matrix may be difficult 


163 


to compute. But for a diagonal matrix A = diag(aj,a2,...,an), we 
have 
A’ = diag(a?, a2,..., a2) 


A® = diag(a?, a3,...,a°) 


and thus 


f(A) = diag(a?+4a?+2, a3+4a242,...,a°+4a?+2) = diag(f(a1),...,f(an)). 
This is true for more general polynomials f: 
f (diag(ai,...,an)) = diag(f(a1),..., f(an)). 


Thus to do any sort of polynomial operation to a diagonal matrix, one 
just has to perform it on the diagonal entries separately. 


If A is an n x n diagonal matrix A = diag(aj,...,an), then the linear 
transformation L4:R” — R” is very simple: 


Ly ay 0 cases 0 Ly a,Xy1 
ae 2 agv2 

Lal. =/!|. ..., ; = 
Le O: Ob coe 2x Le AnXn 


Thus L, dilates the first co-ordinate x7, by a, the second co-ordinate x2 
by ag, and so forth. In particular, if 8 = (e1, e2,..., en) is the standard 
ordered basis for R”, then 


Lyey =ayey; Laeg = ageg;...; Lyen = nen. 


This leads naturally to the concept of eigenvalues and eigenvalues, 
which we now discuss. 


Remember from last week that the rank of a matrix is equal to the 
number of non-zero rows in row echelon form. Thus it is easy to see 
that 


Lemma 1. The rank of a diagonal matrix is equal to the number of 
its non-zero entries. 


164 


Thus, for instance, diag(3, 4,5,0,0,0) has rank 3. 


KOK OK OK Ok 


Eigenvalues and eigenvectors 


Let T: V > V be a linear transformation from a vector space V to 
itself. One of the simplest possible examples of such a transformation 
is the identity transformation T = Jy, so that Tv = v for all v € 
V. After the identity operation, the next simplest example of such a 
transformation is a dilation T = Aly for some scalar X, so that Tv = Av 
for allu eV. 


In general, though, 7 does not look like a dilation. However, there are 
often some special vectors in V for which 7 is as simple as a dilation, 
and these are known as eigenvectors. 


Definition An eigenvector v of T is a non-zero vector v € V such that 
Tv = Xv for some scalar 4. The scalar \ is known as the ezgenvalue 
corresponding to v. 


Example Consider the linear transformation T : R? > R? defined by 
T (x,y) := (52, 3y). Then the vector v = (1,0) is an eigenvector of T 
with eigenvalue 5, since Tv = T(1,0) = (5,0) = 5v. More generally, 
any non-zero vector of the form (x, 0) is an eigenvector with eigenvalue 
5. Similarly, (0, y) is an eigenvector of T’ with eigenvalue 3, if y is non- 
zero. The vector v = (1,1) is not an eigenvector, because Tv = (5,3) 
is not a scalar multiple of v. 


Example More generally, if A = diag(a;,...,a,) is a diagonal matrix, 
then the basis vectors €),...,€, are eigenvectors for D4, with eigenval- 
UES Q1,..., An respectively. 


Example If 7 is the identity operator, then every non-zero vector 
is an eigenvector, with eigenvalue 1 (why?). More generally, if T = 
Aly is A times the identity operator, then every non-zero vector is an 
eigenvector, with eigenvalue \ (Why’). 


Example If T : V — V is any linear transformation, and v is any 
non-zero vector in the null space N(7), then v is an eigenvector with 
eigenvalue 0. (Why’) 


165 


Example Let T : R? > R? be the reflection through the line / connect- 
ing the origin to (4,3). Then (4,3) is an eigenvector with eigenvalue 1 
(why?), and (3, —4) is an eigenvector with eigenvalue -1 (why’). 


We do not consider the 0 vector as an eigenvector, even though T’0 is 
always 0, because we cannot determine what eigenvalue 0 should have. 


If A is an n X n matrix, we say that v is an eigenvector for A with 
eigenvalue A if it is already an eigenvector for L.4 with eigenvalue A, i.e. 
Av = Xv. In other words, for the purposes of computing eigenvalues 
and eigenvectors we do not distinguish between a matrix A and its 
linear transformation L 4. 


(Incidentally, the word “eigen” is German for “own”. An eigenvector 
is a vector which keeps its own direction when acted on by JT’. The 
terminology is thus a hybrid of German and English, though some 
people prefer “principal value” and “principal vector” to avoid this (or 
“characteristic” or “proper” instead of “principal” ). Then again, 


“vector” is pure Latin. English is very cosmopolitan). 


Definition Let JT : V — V be a linear transformation, and let \ be 
a scalar. Then the ezgenspace of T corresponding to A is the set of all 
vectors (including 0) such that Tv = Av. 


Thus an eigenvector with eigenvalue A is the same thing as a non- 
zero element of the eigenspace with eigenvalue A. Since Tv = Xv is 
equivalent to (T’— Aly)v = 0, we thus see that the eigenspace of T 
with eigenvalue \ is the same thign as the null space N(T — Aly) of 
T — XIy. In particular, the eigenspace is always a subspace of V. From 
the above discussion we also see that is an eigenvalue of Tif and only 
if N(T — AI) is non-zero, i.e. when T — Aly is not one-to-one. 


Example Let T : R? — R? be the transformation T(z, y) := (52, 3y). 
Then the z-axis is the eigenspace N(T — 3/g2) with eigenvalue 5, while 
the y-axis is the eigenspace N (T'— 5/2) with eigenvalue 5. For all other 
values \ # 3,5, the eigenspace V(T—AJpz) is just the zero vector space 


{0} (why?). 


166 


The relationship between eigenvectors and diagonal matrices is the fol- 
lowing. 


Lemma 2. Suppose that V is an n-dimensional vector space, and 
suppose that T': V — V is a linear transformation. Suppose that 
V has an ordered basis 6 = (v1, v2,-..,Un), such that each v; is an 


eigenvector of T' with eigenvalue \;. Then the matrix Tl is a diagonal 


matrix; in fact [Tl = diag (Migtcs ean 


Conversely, if 8 = (v1, v2,..., Un) is a basis such that [Tl = diag (Aigo tA) 
then each v; is an eigenvector of T with eigenvalue 1. 


Proof Suppose that v; is an eigenvector of T’ with eigenvalue \;. Then 
Tv; = Ajv;, so [Tv;]® is just the column vector with j” entry equal 
to A;, and all other entries zero. Putting all these column vectors 
together we see that [Ts = diag(A1,...,An). Conversely, if [Tl = 
diag(\1,...,An), then by definition of ae we see that Tv; = A;v,;, and 
so u; is an eigenvector with eigenvalue A,. 


Definition. A linear transformation T': V — V is said to be diago- 
nalizable if there is an ordered basis 6 of V for which the matrix [T i 


is diagonal. 


Lemma 2 thus says that a transformation is diagonalizable if and only 
if it has a basis consisting entirely of eigenvectors. 


Example Let T : R? > R? be the reflection through the line / connect- 
ing the origin to (4,3). Then (4,3) and (3,—4) are both eigenvectors 
for T. Since these two vectors are linearly independent and R? is two- 
dimensional, they form a basis for 7’. Thus T' is diagonalizable; indeed, 
if 8 := ((4,3), (3, —4)), then 


rye = ( aa ) = die =1y, 


If one knows how to diagonalize a transformation, then it becomes very 
easy to manipulate. For example, in the above reflection example we 


167 


see very quickly that T must have rank 2 (since diag(1,—1) has two 
non-zero entries). Also, we can square T easily: 


[Tg = diag(1, —1)? = diag(1, 1) = l, = [Ine] 


and hence T? = Ig2, the identity transformation. (Geometrically, this 
amounts to the fact that if you reflect twice around the same line, you 
get the identity). 


Definition. An n x n matrix A is said to be diagonalizable if the 
corresponding linear transformation L4 is diagonalizable. 


Example The matrix A = diag(5,3) is diagonalizable, because the 
linear operator L,4 in the standard basis 6 = ((1,0),(0,1)) is just A 
itself: [DL ale = A, which is diagonal. So all diagonal matrices are 
diagonalizable (no surprise there). 


Lemma 3. A matrix A is diagonalizable if and only if A = QDQ7! 
for some invertible matrix Q and some diagonal matrix D. In other 
words, a matrix is diagonalizable if and only if it is similar to a diagonal 
matrix. 


Proof Suppose A was diagonalizable. Then [L rie would be equal to 
some diagonal matrix D, for some choice of basis 6’ (which may be 
different from the standard ordered basis 3. But by the change of 
variables formula, 


Aba, Ola Sono. 


as desired, where Q := [Ten] is the change of variables matrix. 


Conversely, suppose that A = QDQ~! for some invertible matrix Q. 
Write D = diag(j,...,An), so that De; = A;. Then 


A(Qe;) = QDQ Qe; = ODE; = Q);67 = 5 (Qe;) 


and so Qe; is an eigenvector for A with eigenvalue A;. Since Q is 
invertible and e1,...,€, is a basis, we see that Qe1,..., Qe, is also a 
basis (why?). Thus we have found a basis of R” consisting entirely of 
eigenvectors of A, and so A is diagonalizable. 


168 


From Lemma 2 and Lemma 3 we see that if we can find a basis 
(U1,---;Un) of R” which consists entirely of eigenvectors of A, then 
A is diagonalizable and A = QDQ™! for some diagonal matrix D and 
some invertible matrix Q. We now make this statement more precise, 
specifying precisely what @ and D are. 


Lemma 4. Let A be an n x n matrix, and suppose that (v,,...,Un) 
is an ordered basis of R” such that each v; is an eigenvector of A with 
eigenvalue 2, (i.e. Av; = Av; for j = 1,...,n). Then we have 


A = Qdiag(A1,...,An)Q7? 


where @ is the n x n matrix with columns v1, v2, ..., Un: 
Q = (01, v2,..-5 Un). 


Proof. Let 3’ be the ordered basis 3" := (v1, v2,...,Un) of R”, and let 
DB := (€1, €2,-..,€n) be the standard ordered basis of R”. Then 


Unnlj = Q 
(why?). So by the change of variables formula 
A= [Lal5 = Q[LalgQ7. 
On the other hand, since Lav; = A;v;, we see that 


[Lalg > diag(A1, ous rs 


Combining these two equations we obtain the lemma. 


OK OK OK 


Computing eigenvalues 


e Now we compute the eigenvalues and eigenvectors of a general matrix. 


The key lemma here is 


e Lemma 5. A scalar 4 is an eigenvalue of an n x n square matrix A if 
and only det(A — AI,) = 0. 


169 


e Proof. If \ is an eigenvalue of A, then Av = Xv for some non-zero v, 
thus (A — AI,)v = 0. Thus A — AI, is not invertible, and so det(A — 
AI,) = 0 (Theorem 12 from last week’s notes). Conversely, if det(A — 
AIn) = 0, then A — AI, is not invertible (again by Theorem 12), which 
means that the corresponding linear transformation is not one-to-one 
(recall that one-to-one and onto are equivalent when the domain and 
range have the same dimension; see Lemma 2 of Week 3 notes). So we 
have (A — AI,,)v = 0 for some non-zero v, which means that Av = Av 
and hence J is an eigenvalue. 


e Because of this lemma, we call det(A — AI,,) the characteristic polyno- 
mial of A, and sometimes call it f(A). Lemma 5 then says that the 
eigenvalues of A are precisely the zeroes of f(A). 


e Example Let A be the matrix 


0 1 
re ( oe ) | 
Then the characteristic polynomial f(A) is given by 


FX) =aet (77 ty J =-AG =A) x1 == An 


From the quadratic formula, this polynomial has zeroes when \ = 
(1 + /5)/2, and so the eigenvalues are \, := (1+ /5)/2 = 1.618... 
and »2 = (1 — /5)/2 := —0.618.... 


e Once we have the eigenvalues of A, we can compute eigenvectors, be- 
cause the eigenvectors with eigenvalues \ are precisely those non-zero 
vectors in the null-space of A — AJ, (or equivalently, of the null space 
of La = AIR). 


e Example: Let A be the above matrix. Let us try to find the eigenvec- 
tors ( ) with eigenvalue \, = (1+ /5)/2. In other words, we want 


to solve the equation 


4-04+-v9/2)(4 )=(5 J, 


170 


or in other words 


aguas ee a) 


or equivalently 


This matrix does not have full rank (since its determinant is zero - why 
should this not be surprising’). Indeed, the second equation here is 


just (1 — V5)/2 times the first equation. So the general solution is y 


1-v5 
2 


_1l-v5 
(7) 


as an eigenvector of A with eigenvalue \, = (1+ V5)/2. 


_l+Vv5 
w= (3 2 ) 


as an eigenvector of A with eigenvalue Ay = (1 — V5)/2. Thus we have 
Av, = yv; and Avg = Agv2. Thus, if we let 3’ be the ordered basis 


B' := (v1, v2), then 
' A, 0 
B 1 
Lala = ( 0 Ay ) 


Thus A is diagonalizable. Indeed, from Lemma 4 we have 


A=QDQ"' 
where D := diag(A1, A2) and Q := (v4, v2). 


arbitrary, and x equal to — y. In particular, we have 


A similar argument gives 


As an application, we recall the example of Fibonacci’s rabbits from 
Week 2. If at the beginning of a year there are x pairs of juvenile 


171 


rabbits and y pairs of adult rabbits - which we represent by the vector 
in R? - then at the end of the next year there will be y juvenile 


pairs and x + y adult pairs - so the new vector is 


y 0 1 £ x 
= =A : 
ea a) 
Thus, each passage of a year multiplies the population vector by A. So 


if we start with one juvenile pair and no adult pairs - so the population 


vector is initially vp :-= - then after n years, the population vector 


1 
0 
should become A”vp. To compute this, one would have to multiply A 
by itself n times, which appears to be difficult (try computing A® by 
hand, for instance!). However, this can be done efficiently using the 
diagonalization A = QDQ~! we have. Observe that 


A’ = QDQ™'QDQ™* = QD’Q** 
AB = 424 = QD?Q“1QDQ"! = QD?Q™! 
and more generally (by induction on n) 
APS ODO * 
In particular, our population vector after n years is 
A’ = QD"Q™ up. 


But since D is the diagonal matrix D = diag(A;, 2), D” is easy to 
compute: 
DP = diag (ii As): 
Now we can compute QD"Q~ ‘up. Since 
_i-v5 _ 14+v5 
Q= (ou) = (3 2 1 2 ) 


we have 


det(Q) = = ao Ee 28 = 


172 


and so by Cramer’s rule 


and so 


and hence 


= a ee 1 1 At 
D"Q- uy = dieg(Ar Aa) Fe ( “4 ) _ z ( av ) ; 


15 Lvs = +2 = -1, we have 


Q= ( ee ae ) _ ( oes XG! 
1 1 1 1 
and hence 


ewnarerme( F) a(S) (OE aa) 


Thus, after n years, the number of pairs of juvenile rabbits is 


Since 


Ry pa th 5 (M61 S061 a. 2 236e 
and the number of pairs of adult rabbits is 
BaP a5 = (L618 3)" = C0618)" 2.236223: 


This is a remarkable formula - it does not look like it at all, but the 
expressions F,_, F;, are always integers. For instance 


Fs = ((1.618...)° — (—0.618...)®)/2.236... = 2. 


(Check this!). The numbers 


Fh=0,F =1,h=15=2,41=3, =5,F =8,F,=13,R=25,... 


173 


are known as Fibonacci numbers and come up in all sorts of places 
(including, oddly enough, the number of petals on flowers and pine 
cones). The above formula shows that these numbers grow exponen- 
tially, and are comparable to (1.618)" when n gets large. The number 
1.618... = (1+ V5)/2 is known as the golden ratio and has several 
interesting properties, which we will not go into here. 


A final note. Up until now, we have always chosen the field of scalars 
to be real. However, it will now sometimes be convenient to change the 
field of scalars to be complex, because one gets more eigenvalues and 
eigenvectors this way. For instance, consider the matrix 


a= (2,3), 


The characteristic polynomial f(A) is given by 


If one restricts the field of scalars to be real, then f(A) has no zeroes, 
and so there are no real eigenvalues (and thus no real eigenvectors). On 
the other hand, if one expands the field of scalars to be complex, then 
f(A) has zeroes at X = +7, and one can easily show that vectors such as 


1 : ee, a . De : 
F ) are eigenvectors with eigenvalue 7, while ( i ) is an eigenvector 


with eigenvalue —7. Thus it is sometimes advantageous to introduce 
complex numbers into a problem which seems purely concerned with 
real numbers, because it can introduce such useful concepts as eigen- 
vectors and eigenvalues into the situation. (An example of this appears 
in Q10 of this week’s assignment). 


174 


Math 115A - Week 8 
Textbook sections: 5.2, 6.1 
Topics covered: 


Characteristic polynomials 


Tests for diagonalizability 
e Inner products 
e Inner products and length 


1% OK OK OK 


Characteristic polynomials 
e Let A be an n X n matrix. Last week we introduced the characteristic 


polynomial 


f(A) = det(A — AL) 


of that matrix; this is a polynomial in \. For instance, if 


then 


f(A) =det {| d e—X f 


= (a—A)((e— A)(i— A) — fh) — B(dG — r) — gf) + c(dh — (e — A)g), 


which simplifies to some degree 3 polynomial in \ (we think of a, b,c, d,e, f,g 
as just constant scalars). Last week we saw that the zeroes of this poly- 
nomial give the eigenvalues of A. 


175 


e As you can see, the characteristic polynomial looks pretty messy. But 
in the special case of a diagonal matrix, e.g 


a 0 0 
AS 0 be 0 
0 0 ¢ 

the characteristic polynomial is quite simple, in fact 


F(A) = (a— A)(b— A)(e— A) 


(why?). This has zeroes when A = a,b,c, and so the eigenvalues of this 
matrix are a, b, and c. 


e Lemma 1. Let A and B be similar matrices. Then A and B have the 
same characteristic polynomial. 


e An algebraist would phrase this as: “the characteristic polynomial is 
invariant under similarity”. 


e Proof. Since A and B are similar, we have B = QAQ7! for some 
invertible matrix Q. So the characteristic polynomial of B is 
det(B— AI) =det(QAQ?1—-AD 
= det(QAQ7! — QAIQ*) 


( 

( 
= det(Q(A — AI)Q™) 
= det(Q) det(A — AT) det(Q~+) 
= det(Q ae 1) det(A — AL) 
= det(QQ71(A — AI)) 
= det(A — XI) 


and hence the characteristic polynomials are the same. 


e Now let’s try to understand the characteristic polynomial for general 
matrices. Let P;(R) be all the polynomials aA + b of degree at most 1; 
we shall make the free variable \ instead of x. Note that all the entries 
in the matrix A — AI lie in P,(R). 


e Lemma 2. Let B be an n x n matrix, all of whose entries lie in P,(R). 
Then det(B) lies in P,(R) (i.e. det(B) is a polynomial in » of degree 
at most 7). 


176 


e Proof. We prove this by induction on n. When n = 1 the claim is 
trivial, since a 1 x 1 matrix with an entry in P,(R) looks like B = 
(ad + b), and clearly det(B) = aA +b€ P,(R). 


Now let’s suppose inductively that n > 1, and that we have already 
proved the lemma for n — 1. We expand det(B) using cofactor ex- 
pansion along some row or column (it doesn’t really matter which row 
or column we use). This expands det(B) as an (alternating-sign) sum 
of expressions, each of which is the product of an entry of B, and a 
cofactor of B. The entry of B is in P,(R), while the cofactor of B is 
in P,_1(R) by the induction hypothesis. So each term in det(B) is in 
P,,(R), and so det(B) is also in P,(R). This finishes the induction. 


e From this lemma we see that f(A) lies in P,,(R), i-e it is a polynomial of 
degree at most n. But we can be more precise. In fact the characteristic 
polynomial in general looks a lot like the characteristic polynomial of 
a diagonal matrix, except for an error which is a polynomial of degree 
at most n — 2: 


e Lemma 3. Let n > 2. Let A be the n x n matrix 


Au Aj tae Ais 
Bee Ast Ass .. Abn 
Ant Ang see Ani 


Then we have 
F(A) = (Ait — A)(A22 — A). (Ann — A) + 9A) 
where g(A) € Pr—2(R). 
e Proof. Again we induct on n. If n = 2 then f(A) = (Ai1—A)(A22—A) — 
Aj2A21 (why?) and so the claim is true with g := —Aj2Ao1 € Po(R). 


Now suppose inductively that n > 2, and the claim has already been 
proven for n — 1. We write out f(A) as 


Ai —x Aj nes Ai, 

Aoi Agg—2A ... Aon 
F(A) = : : : ; 

Ani Ano sain: Aig =a 


177 


e Now we do cofactor expansion along the first row. The first term in 
this expansion is 


Ag2—- ... Aan 
(Ai: — A) det — 
Ang eames ie Ann 7 r 
But this determinant is just the characteristic polynomial of an n—1 x 
n — 1 matrix, and so by the induction hypothesis we have 
Agg— ... Aan 
det | : : : = (Ag—A)... (Ann—A)+ something in P,,_3(R). 
An ... Ann — 
Thus the first term in the cofactor expansion is 

(Ait — A)(Aga — A)... (Ann — A) + something in P,,2(R). 


(Why did the P,,3(R) become a P,-2(R) when multiplying by (Ay, — 
A)?). 


e Now let’s look at the second term in the cofactor expansion; this is 


Ao Pare Aon 
—Ajp det o> 
Ani .»» Ann — A 


We do cofactor expansion again on the second row of this n—1 x n—1 
determinant. We can expand this determinant as an alternating-sign 
sum of terms, which look like A»; times some n—2 x n—2 determinant. 
By Lemma 2, this n — 2 x n — 2 determinant lies in P,,2(R), while 
Ag; is a scalar. Thus all the terms in this determinant lie in P,,2(R), 
and so the determinant itself must lie in P,,-2(R) (recall that P,-2(R) 
is closed under addition and scalar multiplication). Thus this second 
term in the cofactor expansion lies in P,-2(R). 


e A similar argument shows that the third, fourth, etc. terms in the 
cofactor expansion of det(A — AJ) all lie in P,-2(R). Adding up all 
these terms we obtain 


det(A— AI) = (Ay, —A)(Aa2— A)... (Ann —A)+ something in P,_2(R) 


178 


as desired. 


If we multiply out 
(Ay — A)(Aag — A). -- (Ann — A) 


we get 


(=A)? +(-A)™ An + Aogt+...+Ann) + stuff of degree at most n—2 


(why?). Note that (Aj; +...+ Ann) is just the trace tr(A) of A. Thus 
from Lemma 3 we have 


F(A) = (-D)PA" + (1) tr (AAP pan 2d Fang A" +... parAtag 


for some scalars an_2,...,a9. These coefficients a,_2,...,a 9 are quite 
interesting, but hard to compute. However, ap can be obtained by a 
simple trick: if we evaluate the above expression at 0, we get 


f(0) = ao; 


but f(0) = det(A — OF) = det(A). We have thus proved the following 
result. 


Theorem 4. The characteristic polynomial f(A) of an n x n matrix 
A has the form 


FO) = (D7 + (- 1) tr (AB ag 282 Han _gAB 8+. . har Atdet(A). 


Thus the characteristic polynomial encodes the trace and the determi- 
nant, as well as some additional information which we will not study 
further in this course. 


Example. The characteristic polynomial of the 2 x 2 matrix 


(oa) 


(a—A)(d— ») —bc = 7 — (a +d) + (ad — bc) 
(why?). Note that a +d is the trace and ad — bc is the determinant. 


is 


179 


e Example. The characteristic polynomial of the 3 x 3 matrix 


oo & 


0 0 
b 0 
O.-€ 
is 

(a—A)(b—A)(e— A) = -A? + (a+ b+) — (ab + be + ca)d + abe. 


Note that a+ b+ c is the trace and abc is the determinant. 


e Since the characteristic polynomial is of degree n and has a leading 
coefficient of —1, it is possible that it factors into n linear factors, i.e. 


F(A) = —(A= Ai)(A = Aa) (A= An) 


for some scalars \1,...,p in the field of scalars (which we will call F 
for a change... this F’ may be either R or C). These scalars do not 
necessarily have to be distinct (i.e. we can have releated roots). If this 
is the case we say that f splits over F’, or more simply that f splits. 


e Example. The characteristic polynomial of the 2 x 2 matrix 
0 1 
1: Q 
is 4? —1 (why?), which splits over the reals as (A—1)(A—(—1)). It also 


splits over the complex numbers because +1 and —1 are real numbers, 
and hence also complex numbers. On the other hand, the characteristic 


polynomial of 
O 1 
—1 0 


is \? + 1, which doesn’t split over the reals, but does split over the 
complexes as (A — 7)(A + 7). Finally, the characteristic polynomial of 


0 1 
0 0 
is A?, which splits over both the reals and the complexes as (A—0)(A—0). 


180 


Example The characteristic polynomial of a diagonal matrix will al- 
ways split. For instance the characteristic polynomial of 


oOo 8 


0 0 
b 0 
Oc 


is 


(a—A)(b— A\(e— A) = -(A—a)(A—B)(A— 0). 


From the previous example, and Lemma 1, we see that the character- 
istic polynomial of any diagonalizable matrix will always split (since 
diagonalizable matrices are similar to diagonal matrices). In particu- 
lar, if the characteristic polynomial of a matrix doesn’t split, then it 
can’t be diagonalizable. 
0 1 
exp 


from an earlier example cannot be diagonalizable over the reals, because 
its characteristic polynomial does not split over the reals. (However, 
it can be diagonalized over the complex numbers; we leave this as an 
exercise). 


Example. The matrix 


It turns out that the complex numbers have a significant advantage 
over the reals, in that polynomials always split: 


Fundamental Theorem of Algebra. Every polynomial splits over 
the complex numbers. 


This theorem is a basic reason why the complex numbers are so useful; 
unfortunately, the proof of this theorem is far beyond the scope of this 
course. (You can see a proof in Math 132, however). 


OK OK OK 


Tests for diagonalizability 


e Recall that an n x n matrix A is diagonalizable if there is an invertible 


matrix Q and a diagonal matrix D such that A = QDQ7!. It is often 


181 


useful to know when a matrix can be diagonalized. We already know 
one such characterization: A is diagonalizable if and only if there is a 
basis of R” which consists entirely of eigenvectors of A. Equivalently: 


Lemma 5. Ann x n matrix A is diagonalizable if and only if one can 
find n linearly independent vectors v1, v2,...,Un in R”, such that each 
vector vu; is an eigenvector of A. 


This is because n linearly independent vectors in R” automatically 
form a basis of R”. 


It is thus important to know when the eigenvectors of A are linearly 
independent. Here is one useful test: 


Proposition 6. Let A be an n x n matrix. Let v1, v2,...,up be 
eigenvectors of A with eigenvalues \;,..., A, respectively. Suppose that 
the eigenvalues A,,...,A, are all distinct. Then the vectors vj,...,Ux 
are linearly independent. 


Proof. Suppose for contradiction that v1, ..., vj, were not independent, 
i.e. there was some scalars a),...,@,, not all equal to zero, such that 


a,V, tdgvo +... + azvu, = 0. 


At least one of the a; is non-zero; without loss of generality we may 
assume that a; is non-zero. 


Now we use a trick to eliminate v,: We apply (A — A;/) to both sides 
of this equation. Using the fact that A — A;J/ is linear, we obtain 


ay(A — Agl)uy + ag(A — Agl)vg +... + a4 (A — Axl) UR = 0. 
But observe that 
(A — Agl)vy = Avy — AKvi = Aqvy — Avr = (Ar — AK) U1 
and more generally 


(A = Apl)v; = (A; _ An)U;- 


182 


In particular we have 
(A = Apl)Up = 0. 


Putting this all together, we obtain 
ay (Ay — Ag)Ur + a2(Ag — Ag)v2 +... + Ap—1(Ag—1 — An) UK—-1 = 0. 


Now we eliminate vz_; by applying A — A,_,/ to both sides of the 
equation. Arguing as before, we obtain 


ai(Aa — Ax) (Ar — An—1)U1 + G@e(A2 — Ax) (A2 — An—i)ue +... 


+@p~—2(An—2 — An) (An—2 — Ag—1)Ub—-2 = 0. 


We then eliminate vz_2, then vz_3, and so forth all the way down to 
eliminating v2, until we obtain 


ay(Ay = Ag) (Ar a Ak-1) ees (Ay = A2)U1 = 0. 


But since the \; are all distinct, and a; is non-zero, this forces v, to 
equal zero. But this contradicts the definition of eigenvector (eigenvec- 
tors are not allowed to be zero). Thus the vectors v,,...,vj, must have 
been linearly independent. 


Proposition 5 holds for linear transformations as well as matrices: see 
Theorem 5.10 of the textbook. 


Corollary 6 Let A be an nxn matrix. If the characteristic polynomial 
of A splits into n distinct factors, then A is diagonalizable. 


Proof. By assumption, the characteristic polynomial f(A) splits as 
f(A) = —A— Ar) --- (A= An) 


for some distinct scalars A1,..., An. Thus we have n distinct eigenvalues 
Aj,-.-,An- For each eigenvalue A; let v; be an eigenvector with that 
eigenvalue, then by Proposition 5 v1,...,U, are linearly independent, 
and hence by Lemma 4 A is diagonalizable. 


183 


e Example Consider the matrix 


1 —-2 
we ( ioe ) 
The characteristic polynomial here is 
fO)=(1-A)4-—A)4+2=2?-5A+6= (A— 2)(A — 3), 


so the characteristic polynomial splits into n distinct factors (regardless 
of whether our scalar field is the reals or the complexes). So we know 
that A is diagonalizable. (If we actually wanted the explicit diagonal- 
ization, we would find the eigenvalues (which are 2,3) and then some 
eigenvectors, and use the previous week’s notes). 


e To summarize what we know so far: if the characteristic polynomial 
doesn’t split, then we can’t diagonalize the matrix; while if it does split 
into distinct factors, then we can diagonalize the matrix. There is still 
a remaining case in which the characteristic function splits, but into 
repeated factors. Unfortunately this case is much more complicated; 
the matrix may or may not be diagonalizable. For instance, the matrix 


(i) 


has a characteristic polynomial of (A — 2)? (why?), so it splits but not 
into distinct linear factors. It is clearly diagonalizable (indeed, it is 
diagonal). On the other hand, the matrix 


(i) 


has the same characteristic polynomial of (\ — 2)? (why?), but it turns 
out not to be diagonalizable, for the following reason. If it were diag- 
onalizable, then we could find a basis of R” which consists entirely of 
eigenvectors. But since the only root of the characteristic polynomial 
is 2, the only eigenvalue is 2. Now let’s work out what the eigenvec- 
tors are. Since the only eigenvalue is 2, we only need to look in the 
eigenspace with eigenvalue 2. We have to solve the equation 


(0 2)(G 2G): 


184 


i.e. we have to solve the system of equations 
26e5u = 208- 2y 29: 


The general solution of this system occurs when y = 0 and 2 is arbi- 
trary, so the eigenspace with eigenvalue 2 is just the x-axis. But the 
vectors from this eigenspace are not enough to span all of R?, so we 
cannot find a basis of eigenvectors. Thus this matrix is not diagonaliz- 
able. 


The moral of this story is that, while the characteristic polynomial does 
carry a large amount of information, it does not completely solve the 
problem of whether a matrix is diagonalizable or not. However, even 
when the characteristic polynomial is inconclusive, it is still possible 
to determine whether a matrix is diagonalizable or not by computing 
its eigenspaces and seeing if it is possible to make a basis consisting 
entirely of eigenvectors. We will not pursue the full solution of the 
diagonalization problem here, but defer it to 115B (where you will learn 
about two more tools to study diagonalization - the minimal polynomial 
and the Jordan normal form). 


One last example. Consider the matrix 
oa aa 
Oe oh ales 
0 0 3 


this is the same matrix as the previous example but we attach another 
row and column, and add a 3. (This is not a diagonal matrix, but is 
an example of a block-diagonal matrix: see this week’s homework for 
more information). The characteristic polynomial here is 


F(A) = (A — 2)°(A — 8) 


(why?), so the eigenvalues are 2 and 3. To find the eigenspace with 
eigenvalue 2, we solve the equation 


210 x x 
0 2 0 Yo PH! ewe bey 
00 3 z z 


185 


and a little bit of work shows that the general solution to this equation 

occurs when y = z = 0 and = is arbitrary, thus the eigenspace is just 
a 

the x-axis {{ 0 ] : 2 €R}. Similarly the eigenspace with eigenvalue 

0 

3 is the z-axis. But this is not enough eigenvectors to span R® (the 2- 

eigenspace only contributes one linearly independent eigenvector, and 

the 3-eigenspace contributes only one linearly independent eigenvector, 

whereas we need three linearly independent eigenvectors in order to 

span R°. 


* OK OK OK 


Inner product spaces 


e We now leave matrices and eigenvalues and eigenvectors for the time 
being, and begin a very different topic - the concept of an inner product 
space. 


e Up until now, we have been preoccupied with vector spaces and var- 
ious things that we can do with these vector spaces. If you recall, a 
vector space comes equipped with only two basic operations: addition 
and scalar multiplication. These operations have already allowed us to 
introduce many more concepts (bases, linear transformations, etc.) but 
they cannot do everything that one would like to do in applications. 


e For instance, how does one compute the length of a vector? In R? or 
R® one can use Pythagoras’s theorem to work out the length, but what 
about, say, a vector in P3(R)? What is the length of x? + 3x? + 6? It 
turns out that such spaces do not have an inherent notion of length: 
you can add and scalar multiply two polynomials, but we have not 
given any rule to determine the length of a polynomial. Thus, vector 
spaces are not equipped to handle certain geometric notions such as 
length (or angle, or orthogonality, etc.) 


e To resolve this, mathematicians have introduced several “upgraded” 
versions of vector spaces, in which you can not only add and scalar 
multiply vectors, but can also compute lengths, angles, inner products, 
etc. One particularly common such “upgraded” vector space is some- 
thing called an inner product space, which we will now discuss. (There 


186 


are also normed vector spaces, which have a notion of length but not 
angle; topological vector spaces, which have a notion of convergence but 
not length; and if you wander into infinite dimensions then there are 
slightly fancier things such as Hilbert spaces and Banach spaces. Then 
there are vector algebras, where you can multiply vectors with other vec- 
tors to get more vectors. Then there are hybrids of these notions, such 
as Banach algebras, which are a certain type of infinite-dimensional 
vector algebra. None of these will be covered in this course; they are 
mostly graduate level topics). 


The problem with length is that it is not particularly linear: the length 
of a vector v+w is not just the length of v plus the length of w. However, 
in R? or R® we can rewrite the length of a vector v as the square root 
of the dot product v-v. Unlike length, the dot product is linear in the 
sense that (v+0’)-w=v-w+v'-wandv:-(wtw’)=v-wto-w, 
with a similar rule for scalar multiplication. (Actually, to be precise, 
the dot product is considered bilinear rather than linear, just as the 
determinant is considered multilinear, because it has two inputs v and 
w, instead of just one for linear transformations). 


Thus, the idea behind an inner product space is to introduce length 
indirectly, by means of something called an inner product, which is a 
generalization of the dot product. Depending on whether the field of 
scalars is real or complex, we have either a real inner product space 
or a complex inner product space. Complex inner product spaces are 
similar to real ones, except the complex conjugate operation z > Z 
makes an appearance. Here’s a clue why: the length |z| of a complex 
number z = a+ bi, is not the square root of z- z, but is instead the 
square root of z- Z. 


We will now use both real and complex vector spaces, and will try 
to take care to distinguish between the two. When we just say “vec- 
tor space” without the modifier “real” or “complex”, then the field of 
scalars might be either the reals or the complex numbers. 


Definition An inner product space is a vector space V equipped with an 
additional operation, called an inner product, which takes two vectors 


187 


v,w € V as input and returns a scalar (v,w) as output, which obeys 
the following three properties: 


(Linearity in the first variable) For any vectors v,v’,w € V and any 
scalar c, we have (v + uv’, w) = (v,w) + (v’,w) and (cu, w) = clu, w). 


(Conjugate symmetry) If v and w are vectors in V, then (w,v) is the 
complex conjugate of (v,w): (w,v) = (vu, w). 


(Positivity) If v is a non-zero vector in V, then (v,v) is a positive real 
number: (v,v) > 0. 


If the field of scalars is real, then every number is its own conjugate 
(e.g. 3 = 3) and so the conjugate-symmetry property simplifies to just 
the symmetry property (w,v) = (v, w). 


We now give some examples of inner product spaces. 


R” as an inner product space. We already know that R” is a real 
vector space. If we now equip R” with the inner product equal to the 
dot product 

(ty) = ay 


1.e. 


Bi 19,245 Bay Visa tea) = 21Y1 + Layo +... + ILnYn = So ayy 
then we obtain an inner product space. For instance, we now have 
(1, 2), (3,4)) = 11. 


To verify that we have an inner product space, we have to verify the 
linearity property, conjugate symmetry property, and the positivity 
property. To verify the linearity property, observe that 


(c+e',y)=(c+2')-y=u-ytor'-y=(z,y)+ (ey) 


and 
(cx, y) = (cx) -y=c(x-y) = c(z,y) 


188 


while the conjugate symmetry follows since 


(yt) =y-c=a2-y= (2,y) = (2,9) 


(since the conjugate of a real number is itself. To verify the positivity 
property, observe that 


{235 05.5 tel Oye tee eR eee te 
which is clearly positive if the 71,...,2%, are not all zero. 


The difference between the dot product and the inner product is that 
the dot product is specific to R”, while the inner product is a more 
general concept and is applied to many other vector spaces. 


One can interpret the dot product x«-y as measuring the amount of 
“correlation” or “interaction” between x and y; the longer that x and y 
are, and the more that they point in the same direction, the larger the 
dot product becomes. If x and y point in opposing directions then the 
dot product is negative, while if x and y point at right angles then the 
dot product is zero. Thus the dot product combines both the length 
of the vectors, and their angle (as can be seen by the famous formula 
x-y = |z\ly| cos @ but easier to work with than either length or angle 
because it is (bi-)linear (while length and angle individually are not 
linear quantities). 


R” as an inner product space II. One doesn’t have to use the dot 
product as the inner product; other dot products are possible. For 
instance, one could endow R” with the non-standard inner product 


(ey) = 10a +y, 


so for instance ((1, 2), (3,4))’ = 110. While this is not the standard 
inner product, it still obeys the three properties of linearity, conju- 
gate symmetry, and positivity (why?), so this is still an inner product 
(though to avoid confusion we have labeled it as (, )’ instead of (,). The 
situation here is similar to bases of vector spaces; a vector space such as 
R” can have a standard basis but also have several non-standard bases 
(for instance, we could multiply every vector in the standard basis by 
10), and the same is often true of inner products. However in the vast 
majority of cases we will use a standard inner product. 


189 


e More generally, we can multiply any inner product by a positive con- 
stant, and still have an inner product. 


e Ras an inner product space. A special case of the previous example 
of R” with the standard inner product occurs when n = 1. Then our 
inner product space is just the real numbers, and the inner product is 
given by the ordinary product: (x,y) := xy. For instance (2,3) = 6. 
Thus, plain old multiplication is itself an example of an inner product 
space. 


e C as an inner product space. Now let’s look at the complex num- 
bers C, which is a one-dimensional complez vector space (so the field of 
scalars is now C). Here, we could reason by analogy with the previous 
example and guess that (z,w) := zw would be an inner product, but 
this does not obey either the conjugate-symmetry property or the pos- 
itivity property: if z were a complex number, then (z, z) = z? would 
not necessarily be a positive real number (or even a real number); for 
instance (i,7) = —1. 


e To fix this, the correct way to define an inner product on C is to 
set (z,w) := zw; in other words we have to conjugate the second 
factor. This inner product is now linear in the first variable (why?) 
and conjugate-symmetric (why?). To verify positivity, observe that 
(a+bi,a+bi) = (a+bi)(a— bi) = a? +b? which will be a positive real 
number if a+ bi is non-zero. 


e C” as an inner product space. Now let’s look at the complex vector 
space 
OP ig as ees Sp) Os ee SOC. 


This is just like R” but with the scalars being complex instead of real; 
for instance (3, 1 + 7,32) would lie in C? but it wouldn’t be a vector in 
R®. We can define an inner product here by 


((21, 22) ++ +5 Zn); (W1, We, ---,Wn)) = 21 Wy + 2oWg +... + 2D} 


note how this definition is a hybrid of the R” inner product and the C 
inner product. This is an inner product space (why?). 


190 


e Functions as an inner product space. Consider C(|0,1];R), the 
space of continuous real-valued functions from the interval [0,1] to R. 
This is an infinite-dimensional real vector space, containing such func- 
tions as sin(x), 7 +3, 1/(2+1), and so forth. We can define an inner 
product on this space by defining 


(Fea) = fF Seale) ae 


for instance, 


. 2 3 | es ee 
By oss 1)x2 ee ss ee a 
(x + 1,2”) [et jade (F 3 IIo A389 


Note that we need the continuity property in order to make sure that 
this integral actually makes sense (as opposed to diverging to infinity or 
otherwise doing something peculiar). One can verify fairly easily that 
this is an inner product space; we just give parts of this verification. 
One of the things we have to show is that 


(fi + fe, 9) = (fis 9) + (f2,9), 


but this follows since the left-hand side is 


[ ie fleyate) ae = f Alerate) ae f° folaygla) de = (fi, 9)-+(fo9) 


To verify positivity, observe that 


= [Hoy ar 


The function f(x)? is always non-negative, and if f is not the zero 
function on [0,1], then f(x)? must be strictly positive for some x € 
[0,1]. Thus there is a strictly positive area under the graph of f(x)’, 
and so i. ie)? dz >. 


e One can view this example as an infinite-dimensional version of the 
finite-dimensional inner product space example of R”. ‘To see this, 


191 


let N be a very large number. Remembering that integrals can be 
approximated by Riemann sums, we have (non-rigorously) that 


N 


[ Feyate) de SDR) 


or in other words 


1 1 2 N 1 2 N 
—(F( TOG) IAG) Z 


(f,9) © 


and the right-hand side resembles an example of the inner product on 
R” (admittedly there is an additional factor of ~, but as observed 
before, putting a constant factor in the definition of an inner product 
just gives you another inner product. 


Functions as an inner product space II. Consider C'(|—1, 1]; R), 
the space of continuous real-valued functions on [—1, 1]. Here we can 
define an inner product as 


(fg) = I. f(x)g(a) dx. 


Thus for instance 


5 : . Be 2 2 
+10) = f (e+e dx = (> + ea 5 
Note that this inner product of x + 1 and 2? was different from the 
inner product of «+1 and 2? given in the previous example! Thus it is 
important, when dealing with functions, to know exactly what the do- 
main of the functions is, and when dealing with inner products, to know 
exactly which inner product one is using - confusing one inner product 
for another can lead to the wrong answer! To avoid confusion, one 
sometimes labels the inner product with some appropriate subscript, 
for instance the inner product here might be labeled (, )cqj—1,1);r) and 
the previous one labeled (, )c((o,1);R)- 


Functions as an inner product space III. Now consider C'((0, 1]; C), 
the space of continuous complex-valued functions from the interval |0, 1] 


192 


to C; this includes such functions as sin(x), 2? + ix — 3 +i, i/(x — 1), 
and so forth. Note that while the range of this function is complex, 
the domain is still real, so x is still a real number. This is a infinite- 
dimensional complex vector space (why?). We can define an inner 
product on this space as 


(fo) = ff fto)ata) ae. 


Thus, for instance 


wari =f o%e-9 da = (> = 


This can be easily verified to be an inner product space. For the posi- 
tivity, observe that 


if) = a f(a) f(a) dx = i. |f(a)|? de. 


Even though f(a) can be any complex number, |f(x)|? must be a non- 
negative real number, and an argument similar to that for real functions 
shows that ie |f(x)|/? dx is a positive real number when f is not the 


zero function on (0, 1]. 


Polynomials as an inner product space. The inner products in 
the above three examples work on functions. Since polynomials are a 
special instance of functions, the above inner products are also inner 
products on polynomials. Thus for instance we can give P3(R) the 
inner product 


(Fea) = fF Fla)gl a) de 


so that for instance 


Or we could instead give P3(R) a different inner product 
1 
(F.9) = f fle)gle) ae 
zl 


193 


so that for instance 


i 4 
(x, 27) = x dt = a = (); 
1 4 
Unfortunately, we have here a large variety of inner products and it 
is not obvious what the “standard” inner product should be. Thus 
whenever we deal with polynomials as an inner product space we shall 
be careful to specify exactly which inner product we will use. However, 
we can draw one lesson from this example, which is that if V is an inner 
product space, then any subspace W of V is also an inner product 
space. (Note that if the properties of linearity, conjugate symmetry, 
and positivity hold for the larger space V, then they will automatically 
hold for the smaller space W (why?)). 


Matrices as an inner product space. Let Mj,.,(R) be the space 
of real matrices with m rows and n columns; a typical element is 


Ait Aj ae Ain 
A= At Az a Aan 
ai Amz Abe Ane 


This is a mn-dimensional real vector space. We can turn this into an 
inner product space by defining the inner product 


=.= 


i.e. for every row and column we multiply the corresponding entries of 
A and B together, and then sum. For instance, 


1 2 Do 6 
( ae 5 PH IXS+2x043xT+4x8=70 


It is easy to verify that this is also an inner product; note how similar 
this is to the standard R” inner product. This inner product can also 
be written using transposes and traces: 


(A,B) = tr(AB'). 


194 


To see this, note that AB‘ is an m x m matrix, with the top left entry 
being Ay; By, + Aj2Byo+...+ Aj, Bin, the second entry on the diagonal 
being Ao, Bo; + Ago Boo +...+ Aon Bon, and so forth down the diagonal 
(why?). Adding these together we obtain (A, B). For instance, 


w(( 1 2 5 6 atte 1x54+2x6 1x74+2x8 
SALT SY?” NEX BEAK SE Bk TFAKS 


=1x5+2x6+3x7+4x8=(( | Ae ae 


Matrices as an inner product space II. Let Minx,(C) be the 
space of complex matrices with m rows and n columns; this is an mn- 
dimensional complex vector space. This is an inner product space with 


inner product 


i=1 j=l 


It is not hard to verify that this is indeed an inner product space. Unlike 
the previous example, the inner product is not given by the formula 
(A, B) = tr(AB‘), however there is a very similar formula. Define the 
adjoint B' of a matrix to be the complex conjugate of the transpose 
Bt; i.e. Bi is the same matrix as B* but with every entry replaced by 
its complex conjugate. For instance, 


5+67 7+ 81 a 
94+102 11412 


142) 3443 ' 
ee 7—8i 11—124 


1-21 5-6i 9-10: ) 


The adjoint is the complex version of the transpose; it is completely 
unrelated to the adjugate matrix in the previous week’s notes. It is 
easy to verify that (A, B) = tr(AB?). 


(Optional remarks) To summarize: many of the vector spaces we have 
encountered before, can be upgraded to inner product spaces. As we 
shall see, the additional capabilities of inner product spaces can be 
useful in many applications when the more basic capabilities of vec- 
tor spaces are not enough. On the other hand, inner products add 


195 


complexity (and possible confusion, if there is more than one choice 
of inner product available) to a problem, and in many situations (for 
instance, in the last six weeks of material) they are unnecessary. So 
it is sometimes better to just deal with bare vector spaces, with no 
inner product attached; it depends on the situation. (A more subtle 
reason why sometimes adding extra structure can be bad, is because 
it reduces the amount of symmetry in a situation; it is relatively easy 
for a transformation to be linear (i.e. it preserves the vector space 
structures of addition and scalar multiplication) but it is much harder 
to be isometric (which means that it not only preserves addition and 
scalar multiplication, but also inner products as well.). So if one insists 
on dealing with inner products all the time, then one loses a lot of 
symmetries, because there is more structure to preserve, and this can 
sometimes make a problem appear harder than it actually is. Some of 
the deepest advances in physics, for instance, particularly in relativ- 
ity and quantum mechanics, were only possible because the physicists 
removed a lot of unnecessary structure from their models (e.g. in rel- 
ativity they removed separate structures for space and time, keeping 
only something called the spacetime metric), and then gained so much 
additional symmetry that they could then use those symmetries to dis- 
cover new laws of physics (e.g. Einstein’s law of gravitation).) 


Some basic properties of inner products: 


From the linearity and conjugate symmetry properties it is easy to see 
that (v,w +w’) = (v,w) + (v,w’) and (v, cw) = t(v, w) for all vectors 
v,w, w’ and scalars c (why?) Note that when you pull a scalar c out of 
the second factor, it gets conjugated, so be careful about that. (Another 
way of saying this is that the inner product is conjugate linear, rather 
than linear, in the second variable. Because the inner product is linear 
in the first variable and only sort-of-linear in the second, it is sometimes 
said that the inner product is sesquilinear (sesqui is Latin for “one and 
a half”). 


The inner product of 0 with anything is 0: (0,v) = (v,0) = 0. (This is 
an easy consequence of the linearity (or conjugate linearity) - Why’). 
In particular, (0,0) = 0. Thus, by the positivity property, (v,v) is 


196 


positive if v is non-zero and zero if v is zero. In particular, if we ever 
know that (v,v) = 0, we can deduce that v itself is 0. 


Inner products and length 


Once you have an inner product space, you can then define a notion of 
length: 


Definition Let V be an inner product space, and let v be a vector 
of V. Then the length of v, denoted ||v||, is given by the formula 


\|v|| == / (vu, v). (In particular, (v, v) = ||v||?). 


Example In R? with the standard inner product, the vector (3, 4) has 
length 


(3, Il = VB, 4), (3, 4)) = V3? + 4? = 5. 


If instead we use the non-standard inner product (x,y) = 10x-y, then 
the length is now 


(3,4) I] = V3, 4), (3, 4)) = V/10(8? + 4?) = 5V10. 


Thus the notion of length depends very much on what inner product 
you choose (although in most cases this will not be an issue since we 
will use the standard inner product). 


From the positivity property we see that every non-zero vector has a 
positive length, while the zero vector has zero length. Thus ||v|| = 0 if 
and only if v = 0. 


If c is a scalar, then 
l|cul] = V (cv, ev) = V/cev, v) = Vel? [le|l? = Ielllell. 


Example In a complex vector space, the vector (3 + 47)v is five times 
as long as v. The vector —v has exactly the same length as v. 


The inner product in some special cases can be expressed in terms of 
length. We already know that (v,v) = ||v||?. More generally, if w is a 


197 


positive scalar multiple of v (so that v and w are parallel and in the 
same direction), then w = cv for some positive real number c, and 


(v, w) = (v, cv) = e(v, v) = ellul|* = llllllev|] = fell, 


i.e. when v and w are pointing in the same direction then the inner 
product is just the product of the norms. In the other extreme, if w is 


a negative scalar multiple of v, then w = —cv for some positive c, and 
(v,w) = (v, -ev) = —e(v,v) = —ellv|l? 
= —|lull] — elllel] = —llelll] — col] = — [elle I, 


and so the inner product is negative the product of the norms. In 
general the inner product lies in between these two extremes: 


Cauchy-Schwarz inequality Let V be an inner product space. For 
any vu, w € V, we have 


(vw) S [elle 


Proof. If w = 0 then both sides are zero, so we can assume that w # 0. 
From the positivity property we know that (v,v) > 0. More generally, 
for any scalars a,b we know that (av + bw, av + bw) > 0. But 


(av + bw,av+ bw) = a(v,av + bw) + b(w, av + bw) 


= aalv,v) + ablv, w) + ba(w, v) + bb(w, w) 
= Jal? |lu|? + adb(u, w) + ba(v, w) + |b]? ||wI|?. 


Since (av+bw,av+bw) > 0 for any choice of scalars a, b, we thus have 
Jal*|ul|? + ad(v, w) + ba(v, w) + |b|*||wl|? = 0 


for any choice of scalars a,b. We now select a and 6 in order to obtain 
some cancellation. Specifically, we set 


a:= ||w||?;6:= —(v, w). 
Then we see that 


lo| lel? = [eo|?(w, w) (v, w) = (v, w) [ew ||"(v, w) + |(v, w) lll? = 0; 


198 


this simplifies to 
[Jeo ||* ell? = Ieoll?|(v, w) /?. 


Dividing by ||w||? (recall that w is non-zero, so that ||w|| is non-zero) 
and taking square roots we obtain the desired inequality. 


Thus for real vector spaces, the inner product (v,w) always lies some- 
where between +||v||||w|| and —||v||||w||. For complex vector spaces, 
the inner product (v,w) can lie anywhere in the disk centered at the 
origin with radius ||v||||w||. For instance, in the complex vector space 
C, ifv =3+4+ 42 and w = 4 — 34 then (v, w) = vw = 251, while ||v|| = 5 
and ||w|| = 5. 


The Cauchy-Schwarz inequality is extremely useful, especially in anal- 
ysis; it tells us that if one of the vectors v,w have small length then 
their inner product will also be small (unless of course the other vector 
has very large length). 


Another fundamental inequality concerns the relationship between length 
and vector addition. It is clear that length is not linear: the length of 
v + w is not just the sum of the length of v and the length of w. For 
instance, in R?, if v := (1,0) and w = (0,1) then |/v + w]| = ||(1,1)]| = 
V24A1+1=|lv|| + ||w]|. However, we do have 
Triangle inequality Let V be an inner product space. For any v,w € 
V, we have 

lv + wll < [loll + [le]. 


Proof. To prove this inequality, we can square both sides (note that 
this is OK since both sides are non-negative): 


lv + wll? < (lol) + [lel)?. 
The left-hand side is 
(ut+w,u+ w) = (v,v) + (uv, w) + (w,v) + (w,w). 


The quantities (v,v) and (w,w) are just ||v||? and ||w||? respectively. 
From the Cauchy-Schwarz inequality, the two quantities (v,w) and 
(w,v) have absolute value at most ||v||||w|]. Thus 


(uv +w,v tw) < lly? + [lolol] + [elle + lel? = (loll + eo)? 


199 


as desired. 


The reason this is called the triangle inequality is because it has a 
natural geometric interpretation: if one calls two sides of a triangle v 
and w, so that the third side is v-+w, then the triangle inequality says 
that the length of the third side is less than or equal to the sum of 
the lengths of the other two sides. In other words, a straight line has 
the shortest distance between two points (at least when compared to 
triangular alternatives). 


The triangle inequality has a couple of variants. Here are a few: 


llv — wl] < [lel] + Ye] 
lv + wl} > lel] — [eo] 
lv + wl] > [lel — |e] 
lly — w]] > [lel] — Ie 
|v — wl] > lel! — [ol 


Thus for instance, if v has length 10, and w has length 3, then both v+w 
and v — w have length somewhere between 7 and 13. (Can you see this 
geometrically?). These inequalities can be proven in a similar manner 
to the original triangle inequality, or alternatively one can start with 
the original triangle inequality and do some substitutions (e.g. replace 
w by —w, or replace v by v — w, on both sides of the inequality; try 
this!). 


200 


Math 115A - Week 9 
Textbook sections: 6.1-6.2 
Topics covered: 


Orthogonality 
e Orthonormal bases 


e Gram-Schmidt orthogonalization 


Orthogonal complements 


OK OK OK 


Orthogonality 


e From your lower-division vector calculus you know that two vectors 
v,w in R? or R? are perpendicular if and only if v-w = 0; for instance, 
(3,4) and (—4,3) are perpendicular. 


e Now that we have inner products - a generalization of dot products - 
we can now give a similar notion for all inner product spaces. 


e Definition. Let V be an inner product space. If v,w are vectors in V, 
we say that v and w are orthogonal if (v, w) = 0. 


e Example. In R* (with the standard inner product), the vectors (1, 1, 0, 0) 
and (0,0,1,1) are orthogonal, as are (1,1,1,1) and (1,—1,1, —1), but 
the vectors (1,1,0,0) and (1,0,1,0) are not orthogonal. In C?, the 
vectors (1,7) and (1, —2) are orthogonal, but (1,0) and (7,0) are not. 


e Example. In any inner product space, the 0 vector is orthogonal to 
everything (why?). On the other hand, a non-zero vector cannot be 
orthogonal to itself (why? Recall that (v,v) = ||v||?). 


e Example. In C((|0,1];C) with the inner product 
1 ———aak, 
(fa) = [taal an, 
0 


201 


the functions 1 and x — 5 are orthogonal (why?), but 1 and x are not. 
However, in C((—1, 1]; C) with the inner product 


ge i Sa)gla ae, 


the functions 1 and x — 5 are no longer orthogonal, however the func- 
tions 1 and x now are. Thus the question of whether two vectors are 
orthogonal depends on which inner product you use. 


Sometimes we say that v and w are perpendicular instead of orthogonal. 
This makes the most sense for R”, but can be a bit confusing when 
dealing with other inner product spaces such as C'([—1,1],C) - how 
would one visualize the functions 1 and x being “perpendicular”, for 
instance (or 7 and x, for that matter)? So I prefer to use the word 
orthogonal when dealing with general inner product spaces. 


Sometimes we write v | w to denote the fact that v is orthogonal to 
w. 


Being orthogonal is at the opposite extreme of being parallel; recall 
from the Cauchy-Schwarz inequality that |(v,w)| must lie between 0 
and ||v||||w||. When v and w are parallel then |(v, w)| attains its max- 
imum possible value of ||v||||w||, while when v and w are orthogonal 
then |(v, w)| attains its minimum value of 0. 


Orthogonality is symmetric: if v is orthgonal to w then w is orthogonal 
to v. (Why? Use the conjugate symmetry property and the fact that 
the conjugate of 0 is 0). 


Orthogonality is preserved under linear combinations: 


Lemma 1. Suppose that v),...,v, are vectors in an inner product 
space V, and suppose that w is a vector in V which is orthogonal to all 
of U1, V2,...,Un. Then w is also orthogonal to any linear combination 
Of Uisces OR: 


Proof Let a,v,+...+4@nUp be a linear combination of v1,...,U,. Then 
by linearity 


(ayUr +... + GnUn, W) = a4 (01, W) +... + Gn (Un, Ww). 


202 


But since w is orthogonal to each of v1,...,Un, all the terms on the 
right-hand side are zero. Thus w is orthogonal to ajv, +... + Gy,Upn as 
desired. 


In particular, if v and w are orthogonal, then cv and w are also orthog- 
onal for any scalar c (why is this a special case of Lemma 1?) 


You are all familiar with the following theorem about orthogonality. 


Pythagoras’s theorem If v and w are orthogonal vectors, then ||v + 
wll? = lvl]? + llwll?. 


Proof. We compute 
lv + wl]? = (vt w,v +w) = (u,v) + (v,w) + (w,v) + (w,w). 


But since v and w are orthogonal, (v,w) and (w,v) are zero. Since 
(v,v) = |lvl|? and (w, w) = ||w||?, we obtain ||v + wll? = |lv||? + [lw]? as 
desired. 


This theorem can be generalized: 


Generalized Pythagoras’s theorem If vj), v2,...,U, are all orthog- 
onal to each other (i.e. v; L vj =0 for all i 4 7) then 


Joy + ve +... + val]? = |Jor||? + |Joel]? +... + |lonll?. 


Proof. We prove this by induction. If n = 1 the claim is trivial, and 
for n = 2 this is just the ordinary Pythagoras theorem. Now suppose 
that n > 2, and the claim has already been proven for n — 1. From 
Lemma 1 we know that v, is orthogonal to vy +... + Upn—1, $0 


Ilo tig Ea es Hag |? = lease all? Ello 
On the other hand, by the induction hypothesis we know that 
[oye ce eal]? = loa l]P ee foal. 


Combining the two equations we obtain 


lor + v2 +... + Onl? = llorll? + llvall? +--+ flenll? 


as desired. 


203 


e Recall that if two vectors are orthogonal, then they remain orthogonal 
even when you multiply one or both of them by a scalar. So we have 


e Corollary 2. If v1,v2,...,Un are all orthogonal to each other (i.e. 
v; Lv; for alli A j) and aj,...,a, are scalars, then 


llarv1 + agv2 +... + @nnl|? = lar]? }erll? + la2/*lle2l|? +--+ lanl? llenll?. 


e Definition A collection (v1, v2,...,Un) of vectors is said to be orthogo- 
nal if every pair of vectors is orthogonal to each other (i.e. (v;,v;) = 0 
for alli ¥ 7). If a collection is orthogonal, and furthermore each vector 
has length 1 (i.e. ||v,|| = 1 for all 7) then we say that the collection is 
orthonormal. 


e Example In R’, the collection ((3,0, 0,0), (0,4, 0,0), (0,0,5,0)) is or- 
thogonal but not orthonormal. But the collection ((1, 0,0, 0), (0,1, 0,0), (0,0, 1, 0)) 
is orthonormal (and therefore orthogonal). Note that any single vector 
v1 is always considered an orthogonal collection (why’?). 


e Corollary 3. If (vi, v2,..., Un) is an orthonormal collection of vectors, 
and aj,...,@n, are scalars, then 
2 2 2 2 
layur + aque +... + An¥n||* = ail? + aol“ +... + lanl’. 


Note that the right-hand side |a,|?+|a2|?+...+ |a,|? is always positive, 
unless @1,...,@» are all zero. Thus ajvj +...+@nUp is always non-zero, 
unless aj,...,@, are all zero. Thus 


e Corollary 4. Every orthonormal collection of vectors is linearly inde- 
pendent. 


OK OK OK 


Orthonormal bases 


e As we have seen, orthonormal collections of vectors have many nice 
properties. As we shall see, things are even better when this collection 
is also a basis: 


204 


Definition An orthonormal basis of an inner product space V is a 
collection (v1,...,U,) of vectors which is orthonormal and is also an 
ordered basis. 


Example. In R’, the collection ((1,0,0,0), (0, 1,0, 0), (0,0, 1,0)) is or- 

thonormal but is not a basis. However, the collection ((1, 0,0, 0), (0, 1, 0,0), (0, 0, 1,0), (0, 0. 
is an orthonormal basis. More generally, the standard ordered basis of 

R” is always an orthonormal basis, as is the standard ordered basis of 

C”. (Actually, the standard bases of R” and of C” are the same; only 

the field of scalars is different). The collection ((1,0,0,0), (1, 1,0,0), (1,1, 1,0), (1, 1,1, 1)) 
is a basis of R* but is not an orthonormal basis. 


From Corollary 4 we have 


Corollary 5 Let (v,...,U,) be an orthonormal collection of vectors 
in an n-dimensional inner product space. Then (v1,...,Un) is an or- 
thonormal basis. 


Proof. This is just because any n linearly independent vectors in an 
n-dimensional space automatically form a basis. 


Example Consider the vectors (3/5,4/5) and (—4/5, 3/5) in R?. It is 
easy to check that they have length 1 and are orthogonal. Since R? is 
two-dimensional, they thus form an orthonormal basis. 


Let (v1,...,Un) be an ordered basis of an n-dimensional inner product 
space V. Since (v1,...,Un) is a basis, we know that every vector v in 
V can be written as a linear combination of v1,..., Un: 


V = A,V, +... + AnUn. 


In general, finding these scalars a,,...,a, can be tedious, and often 
requires lots of Gaussian elimination. (Try writing (1,0,0,0) as a lin- 
ear combination of (1,1,1,1), (1,2,3,4), (2,2,1,1) and (1, 2,1, 2), for 
instance). However, if we know that the basis is an orthonormal basis, 
then finding these coefficients is much easier. 


Theorem 6. Let (vi,...,U,) be an orthonormal basis of an inner 
product space V. Then for any vector v € V, we have 


V = aU, + AQVg +... + AnUn 


205 


where the scalars a1,...,@, are given by the formula 


5104; forall = lysis 8 
Proof. Since (v1,...,Un) is a basis, we know that v = ajv, +...+4nUn 
for some scalars a,,...,@n. To finish the proof we have to solve for 


aj,...,@, and verify that a; = (v,v,;) for all 7 = 1,...,n. To do this 
we take our equation for v and take inner products of both sides with 
Uj: 

(vu, 03) = (a1v1 + aque +... + GnUn, U;)- 


We expand out the right-hand side as 


(U1, Vj) + Ga(v2, Uj) +--+ On (Un, U;)- 


Since v1,...,U, are orthogonal, all the inner products vanish except 
(vj, ¥;) = |lv,|[?. But ||v,;|] = 1 since v1,..., Un is also orthonormal. So 
we get 


(v,0;) =O0+ 0+ O+a;x1+0+...+0 


as desired. 


From the definition of co-ordinate vector [v]°, we thus have a simple 
way to compute co-ordinate vectors: 


Corollary 7 Let 6 = (v1,...,Un) be an orthonormal basis of an inner 
product space V. Then the co-ordinate vector [v]* of any vector v is 
then given by 


(uy, Un) 


Note that Corollary 7 also gives us a reltaively quick way to compute 
the co-ordinate matrix [T]} of a linear operator T: V + W provided 
that 7 is an orthonormal basis, since the columns of [T]} are just [Tv,]”, 
where v; are the basis vectors of (. 


206 


e Example. Let v; := (1,0,0), ve := (0,1,0), v3 := (0,0,1), and v := 
(3,4,5). Then (v1, v2, v3) is an orthonormal basis of R?. Thus 


VU = AyVy + AQV2 + a3U3, 


where a, := (v,v1) = 3, a2 := (v,v2) = 4, and a3 := (v,v3) = 5. 
(Of course, in this case one could expand v as a linear combination of 
V1, V2, V3 just by inspection.) 


e Example. Let v, := (3/5,4/5), vg := (—4/5,3/5), and v := (1,0). 
Then (v;, v2) is an orthonormal basis for R?. Now suppose we want to 
write v as a linear combination of v; and va. We could use Gaussian 
elimination, but because our basis is orthogonal we can use Theorem 6 
instead to write 

V = AV 1 + Agv2 


where a, := (v,v1;) = 3/5 and ag := (v,v9) := —4/5. Thus v = 


Zu, _ 202. (Try doing the same thing using Gaussian elimination, and 


see how much longer it would take!). Equivalently, we have 


fy] (om) = ( ay ) 


e The example of Fourier series. We now give an example which 
is important in many areas of mathematics (though we won’t use it 
much in this particular course) - the example of Fourier series. Let 
C([0,1];C) be the inner product space of continuous complex-valued 
functions on the interval |0, 1], with the inner product 


(fg) = i ANa@ oe 


Now consider the functions . .. , v_3, V_2, U-1, Vo, U1, V2, V3, --- in C([0, 1]; C) 
defined by 


UE (x) = ee 


these functions are sometimes known as complex harmonics. 


207 


e Observe that these functions all have length 1: 


||vl] = (ve, Va)? = f v4(X)vg(e) dx)/? 


1 1 
a (f e2tike (—2rikx dx)!/? = (f iI dx)!/? = 4. 

0 0 

Also, they are all orthogonal: if 7 4 k then 


1 L 
(vj, Uk) = | v;(x)up(x) dx = | eet oe da 
0 0 


1 2ni(j—k)a Qni(j—k 
| e2mii—ke g GRRE 5 peered 
0 


T= DG om)"~ (aij —m) 


1-1 
=>, = 0. 
2ri(j — k) 
Thus the collection ...,v_3, v_2, V_1, Vo, V1, V2, V3,... is an infinite or- 


thonormal collection of vectors in C([0, 1]; C). 


e We have not really discussed infinite bases, but it does turn out that, 
in some sense, that the above collection is an orthonormal basis; thus 
every function in C'([0,1];C) is a linear combination of complex har- 
monics. (The catch is that this is an infinite linear combination, and 
one needs the theory of infinite series (as in Math 33B) to make this 
precise. This would take us too far afield from this course, unfortu- 
nately). This statement - which is not very intuitive at first glance 
- was first conjectured by Fourier, and forms the basis for something 
called Fourier analysis (which is an entire course in itself!). For now, 
let us work with a simpler situation. 


e Define the space T,, of trigonometric polynomials of degree at most n 
to be the span of vo, U1, V2,...,Un- In other words, T;, consists of all the 
functions f € C([0,1];C) of the form 


f = a 4+ Ge 4 ine * ee eae 


Notice that this is very similar to the space P,,(R) of polynomials of 
degree at most n, since an element f € P,,(R) has the form 


f=ataiz4 Go a ae 


and so the difference between polynomials and trigonometric polynomi- 
als is that x has been replaced by e?"* = cos(27x) + isin(272) (hence 
the name, trigonometric polynomial). 


e 7, is a subspace of the inner product space C'({0,1];C), and is thus 
itself an inner product space. Since the vectors vp, U1,...,Un are or- 
thonormal, they are linearly independent, and thus (vo, v1,...,Un) is 
an orthonormal basis for T;,. Thus by Theorem 6, every function f in 
T,, can be written as a series 


nr 
; a 
f = a9 ae aie at Ge dhe a ace =. y aje ae 
j=0 
where the (complex) scalars ag, @1,...,@, are given by the formula 


1 
= Ee | f(a)e 7" dz. 
0 
The coefficients a; are known as the Fourier coefficients of f, and the 


above series is known as the Fourier series of f. From Corollary 3 we 
have the formula 


1 n 
i Aa)? de = |[fI|2 = lao? + lal? +--+ lanl? = So lagl’ 
j=0 


this is known as Plancherel’s formula. These formulas form the foun- 
dation of Fourier analysis, and are useful in many other areas, such 
as signal processing, partial differential equations, and number theory. 
(Actually, to be truly useful, one needs to generalize these formulas to 
handle all kinds of functions, not just trigonometric polynomials, but 
to do so is beyond the scope of this course). 


OK OK OK 


The Gram-Schmidt orthogonalization process. 


e In this section all vectors are assumed to belong to a fixed inner product 
space V. 


209 


In the last section we saw how many more useful properties orthonormal 
bases had, in comparison with ordinary bases. So it would be nice 
if we had some way of converting a non-orthonormal basis into an 
orthonormal one. Fortunately, there is such a process, and it is called 
Gram-Schmidt orthogonalization. 


To make a basis orthonormal there are really two steps; first one has 
to make a basis orthogonal, and then once it is orthogonal, one has to 
make it orthonormal. The second procedure is easier to describe than 
the first, so let us describe that first. 


Definition. A unit vector is any vector v of length 1 (i.e. ||v|| = 1, or 
equivalently (v,v) = 1). 


Example. In R?, the vector (3/5, 4/5) is a unit vector, but (3, 4) is not. 
In C((0, 1]; C), the function x is not a unit vector (||z||? = Nee ae 
1/2), but 2x is (why?). The 0 vector is never a unit vector. In R3, 
the vectors (1,0,0), (0,1,0) and (0,0,1) are all unit vectors. 


Unit vectors are sometimes known as normalized vectors. Note that 
an orthogonal basis will be orthonormal if it consists entirely of unit 
vectors. 


Most non-zero vectors are not unit vectors, e.g. (3,4) is not a unit 
vector. However, one can always turn a non-zero vector into a unit 
vector by dividing out by its length: 


Lemma 8. If v is a non-zero vector, then v/||v|| is a unit vector. 


Proof. Since v is non-zero, ||v|| is non-zero, so v/||u|| is well defined. 
But then 


lle/Mellll = Il || = = — Iv | = = 
al lel 
and so v/||v|| is a unit vector. 
We sometimes call v/||v|| the normalization of v. If (v1, v2,...,Un) isa 
basis, then we can normalize this basis by replacing each vector vu; by its 
normalization v;/||v;||, obtaining a new basis (v1 /||v1||, v2/||v2l|,-- - Un/|]Unl|) 


which now consists entirely of unit vectors. (Why is this still a basis?) 


210 


Lemma 9. If (v1, v2,..., Un) is an orthogonal basis of an inner product 
space V, then the normalization (v,/||v1]|, v2/||vall,.-- 5 Un/||Unl]) is an 
orthonormal basis. 


Proof. Since the basis (v1,...,Un) has n elements, V must be n- 
dimensional. Since the vectors v; are orthogonal to each other, the 
vectors v;/||v;|| must also be orthogonal to each other (multiplying 
vectors by a scalar does not affect orthogonality). By Lemma 8, these 
vectors are also unit vectors. So the claim follows from Corollary 5. 


Example. The basis ((3, 4), (—40,30)) is an orthogonal basis of R? 
(why?). If we normalize this basis we obtain ((3/5, 4/5), (—4/5, 3/5)), 
and this is now an orthonormal basis of R?. 


So we now know how to turn an orthogonal basis into an orthonormal 
basis - we normalize all the vectors by dividing out their length. Now 
we come to the tricker part of the procedure - how to turn a non- 
orthogonal basis into an orthogonal one. The idea is now to subtract 
scalar multiples of one vector from another to make them orthogonal 
(you might see some analogy here with row operations of the second 
and third kind). 


To illustrate the idea, we first consider the problem of how to make 
just two vectors v, w orthogonal to each other. 


Lemma 10. If v and w are vectors, and w is non-zero, then the vector 
v — cw is orthogonal to w, where the scalar c is given by the formula 


— {v,w) 


Co poll?” 
Proof. We compute 

(v — ew, w) = (v,w) — e(w,w) = (v,w) — ellwl|’. 
(vw 


But since c := 
[lew||?? 
to w. 


ea 


we have (v—cw, w) = 0 and so v—cw is orthogonal 


Example. Let v = (3,4) and w = (5,0). Then v and w are not 
orthogonal; in fact, (v,w) = 15 #0. But if we replace v by the vector 
v =u —cw = (3,4) — (5,0) = (3,4) — (3,0) = (0,4), then v’ is now 
orthogonal to w. 


211 


e Now we suppose that we have already made k vectors orthogonal to 
each other, and now we work out how to make a (k + 1)" vector also 
orthogonal to the first k. 


e Lemma 11. Let wy, we,...,w, be orthogonal non-zero vectors, and 
let v be another vector. Then the vector v’, defined by 


/ 


VU I= U— CyW, — CoW2 — ... — ChWE 
is orthogonal to all of w 1, we,..., we, where the scalars c;, Co,..., Ce are 
given by the formula 
B= (uv, W;) 
J ||2 
ea| 


for All y= ce 


e Note that Lemma 10 is just Lemma 11 applied to the special case k = 1. 
We can write v’ in series notation as 


e Proof. We have to show that v’ is orthogonal to each w;. We compute 
(v', wj) = (v, w;) — c1(wi, wy) — Co(we, Wj) — ... — CK (We, W;). 
But we are assuming that the w,,..., w, are orthogonal, so all the inner 
products (w;,w,;) are zero, except for (w;,w;), which is equal to ||w,||?. 
Thus 
(v', wy) = (v, wz) — ¢llw,l|?. 


But since c; :-= 2 
J Jew; {17 ? 


we thus have (v’,w;) and so v’ is orthogonal to 


e We can now use Lemma 11 to turn any linearly independent set of 
vectors into an orthogonal set. 


e Gram-Schmidt orthogonalization process. Let v1,v2,...,U, bea 
linearly independent set of vectors. Suppose we construct the vectors 
W 1,...,Wn by the formulae 


(v2, W1) 


W2 = V2 — 1 
||. ||? 
U3, W U3, W 
W3 = 3 ( 3) u ‘ 3) 2) Wo 
||| || w2| 
Wy = Un — weal —..n ale fei 
I|w1 | ||n-al| 
Then the vectors w1,...,Wny are orthogonal, non-zero, and the vector 
space span(w1,..., Wn) is the same vector space as span(v1,...,Un) (i.e. 
the vectors w1,...,Wn have the same span as v),...,Un). More gen- 
erally, we have that w),...,w, has the same span as v1,..., Uz for all 


Ll<k<n. 


Proof. We prove this by induction on n. In the base case n = 1 we 
just have w, := v,, and so clearly v; and w, has the same span. Also 
v, is an orthonormal collection of vectors (by default, since there is 
nobody else to be orthonormal to). 


Now suppose inductively that n > 1, and that we have already proven 
the claim for n — 1. In particular, we already know that the vectors 
W 1,.-+,Wn—1 are orthogonal, non-zero, and that v,,..., vz has the same 
span as W1,...,W, for any 1 <k <n-—1. By Lemma 11, we thus see 
that the vector wy is orthogonal to wy,...,Wn—1. Now we have to show 
that w, is non-zero and that w,,..., Wy, has the same span as V1,..., Un. 


Let V denote the span of v1,..., Un, and W denote the span of w1,..., Wn. 
We have to show that V = W. Note that W contains the span of 
W 1,-.-,Wn—1, and hence contains the span of v1,...,Un—1. In particu- 
lar it contains vj,...,Un—1, and also contains w,. But from the formula 


(Un; W1) 
[| ||? 


(n3,Wn—1) 
wyt...4 
rT Toa? 


n—-1 


we thus see that W contains v,,. Thus W contains the span of v1,..., Un, 
i.e. W contains V. But V is n-dimensional (since it is the span of n 
linearly independent vectors), and W is at most n dimensional (since 
W is also the span of n vectors), and so V and W must actually be 
equal. Furthermore this shows that w,,...,w, are linearly independent 


213 


(otherwise W would have dimension less than n). In particular wy, 
is non-zero. This completes everything we need to do to finish the 
induction. 


Example Let v, := (1,1,1), ve := (1,1,0), and v3 := (1,0,0). The 
vectors U1, V2, V3 are independent (in fact, they form a basis for R*) but 
are not orthogonal. To make them orthogonal, we apply the Gram- 
Schmidt orthogonalization process, setting 


= =a) 


(v2, W1) 2 bt 2 
Sipe 0 1 = =O SG =e 
W2 V2 ||w, || Wi ( ? 9 ) 3 | ? ? ) (3) af 3) 

(v3, W1) (v3, W2) 

W3 I= U3 — W2 
I|ew. ||? || we ||? 
1 Vat) oO 
= (1,0,0) — =(1,1,1) — —~(-, =, -=) = (1/2, -1/2,0). 
( oe ) 3 | 7 aoe) ) 6/93" 3" 3) ( / ) / ) ) 
Thus we have created an orthogonal set (1,1, 1), (3, e —3), (5, —3,0), 


which has the same span as V1, V2, V3, i.e. it is also a basis for R®. Note 
that we can then use Lemma 8 to normalize this basis and make it 
orthonormal, obtaining the orthonormal basis 

1 1 


—(1, 1,1), ~(, 1, -2), 


EG or) 


V2 
We shall call the normalized Gram-Schmidt orthogonalization process 
the procedure of first applying the ordinary Gram-Schmidt orthogonal- 


ization process, and then normalizing all the vectors one obtains as a 
result of that process in order for them to have unit length. 


One particular consequence of Gram-Schmidt is that we always have at 
least one orthonormal basis lying around, at least for finite-dimensional 
inner product spaces. 


Corollary 12. Every finite-dimensional inner product space V has an 
orthonormal basis. 


214 


e Proof. Let’s say V is n-dimensional. Then V has some basis (1, ..., Un). 
By the Gram-Schmidt orthogonalization process, we can thus create a 
new collection (w ,...,W,) of non-zero orthogonal vectors. By Lemma 
9, we can then create a collection (y1,...,Yn) of orthonormal vectors. 
By Corollary 5, it is an orthonormal basis of V. 


OK OK OK 


Orthogonal complements 


e We know what it means for two vectors to be orthogonal to each other, 
v L w; it just means that (v,w) = 0. We now state what it means for 
two subspaces to be orthogonal to each other. 


e Definition. Two subspaces V;, V2 of an inner product space V are said 
to be orthogonal if we have v, | ve for all v; € Vi and v2 € V2, and we 
denote this by V, _L Va. 


e Example. The subspaces V; := {(,y,0,0,0) : x,y € R} and V2 := 
{(0,0,z,w,0): z,w € R} of R’ are orthogonal, because (x, y,0,0,0) L 
(0,0, z, w, 0) for all x,y, z,w € R. The space V, is similarly orthogonal 
to the three-dimensional space V3 := {(0,0,z,w,u) : z,w,u € R}. 
However, V; is not orthogonal to the one-dimensional space V4 := 
{(t,t,t,t,t) : t © R}, since the inner product of (x, y,0,0,0) and 
(t, t,t, t,t) can be non-zero (e.g. take x =y =t = 1). 


e Example. The zero vector space {0} is orthogonal to any other sub- 
space of V (why?) 


e Orthogonal spaces have to be disjoint: 
e Lemma 13. If Vj L V2, then Vi N V2 = {0}. 


e Clearly 0 lies in V; MN V2 since every vector space contains 0. Now 
suppose for contradiction that V; M V2 contained at least one other 
vector v, which must of course be non-zero. Then v € V; and v € V3; 
since V; | Vo, this implies that v L v, ie. that (v,v) = 0. But this 
implies that ||v||? = 0, hence v = 0, contradiction. Thus V; N V2 does 
not contain any vector other than zero. 


215 


e As we can see from the above example, a subspace V; can be orthogonal 
to many other subspaces Vj. However, there is a maximal orthogonal 
subspace to V; which contains all the others: 


e Definition The orthogonal complement of a subspace V; of an inner 
product space V, denoted V,-, is defined to be the space of all vectors 
perpendicular to V,: 


Vi :={veV:u 1 w for all w € Vy}. 


e Example. Let V; := {(z,y,0,0,0) : 2,y € R}. Then V;* is the space 
of all vectors (a,b,c,d,e) € R® such that (a,b,c, d,e) is perpendicular 
to Vi, ie. 


((a, b, c, d, e), (x, y,0,0,0)) = 0 for all z,y ER. 


In other words, 
ax + by = 0 for all z,y ER. 


This can only happen when a = b = 0, although c, d, e can be arbitrary. 
Thus we have 
Vi = {(0,0,c,d,e):¢,d,e € R}, 


i.e. V+ is the space V3 from the previous example. 


e Example. If {0} is the zero vector space, then {0}+ = V (why?). A 
little trickier is that V+ = {0}. (Exercise! Hint: if v is perpendicular 
to every vector in V, then in particular it must be perpendicular to 
itself). 


e From Lemma 1 we can check that V; is a subspace of V (exercise!), 
and is hence an inner product space. 


e Lemma 14. If Vi | V2, then V2 is a subspace of V;>. Conversely, if V2 
is a subspace of V1, then V, L V2. 


e Proof. First suppose that V, L V2. Then every vector v in V2 is 
orthogonal to all of V;, and hence lies in V,;> by definition of V;+. Thus 
V2 is a subspace of V;'. Conversely, if Vo is a subspace of V,+, then 
every vector v in V2 is in V; and is thus orthogonal to every vector in 
V,. Thus V; and V2 are orthogonal. 


216 


Sometimes it is not so easy to compute the orthogonal complement of 
a vector space, but the following result gives one way to do so. 


Theorem 15. Let W bea k-dimensional subspace of an n-dimensional 
inner product space V. Let (v1,...,vx%) be a basis of W, and let 
(U1, V2,-+-;Uk; Uk+1;+++;Un) be an extension of that basis to be a basis 
of V. Let (wi,...,Wn) be the normalized Gram-Schmidt orthogonal- 
ization of (vi,...,Un). Then (wi,...,wx) is an orthonormal basis of 
W, and (we41,..-,Wn) is an orthonormal basis of W+. 


Proof. From the Gram-Schmidt orthogonalization process, we know 
that (wi,...,wWx) spans the same space as (v1,...,Ux) - ie. it spans 
W. Since W is k-dimensional, this means that (w1,..., wx) is a basis 
for W, which is orthonormal by the normalized Gram-Schmidt process. 
Similarly (w1,...,W,) spans the n-dimensional space V, which implies 
that it is a basis for V. 


Thus the the vectors wy41,..-, Wn are orthonormal and thus (by Corol- 
lary 4) linearly independent. It remains to show that they span W+. 
First we show that they lie in W+. Let w; be one of these vectors. Then 
w, is orthogonal to w1,...,w,, and is thus (by Lemma 1) orthogonal 
to their span, which is W. Thus w, lies in W~+. In particular, the span 
of We41,.-.,Wn must lie inside W+. 


Now we show that every vector V+ lies in the span of w1,..., wz. Let 
v be any vector in W+. By Theorem 6 we have 


v = (u, W1)Wy +... + (UY, Wn) Wn- 


But since v € W+, v is orthogonal to w),...,w,, and so the first k 
terms on the right-hand side vanish. Thus we have 


VU = (U, Weg) Weg +... + (U, Wn)Wn 


and in particular v is in the span of wz41,..., Wn» as desired. 


Corollary 16 (Dimension theorem for orthogonal complements) 
If W is a subspace of a finite-dimensional inner product space V, then 
dim(W) + dim(W+) = dim(V). 


217 


e Example. Suppose we want to find the orthogonal complement of the 
line W := {(z,y) € R? : 324+ 4y = 0} in R’. This example is so simple 
that one could do this directly, but instead we shall choose do this via 
Theorem 15 for sake of illustration. We first need to find a basis for W; 
since W is one-dimensional, we just to find one non-zero vector in W, 
e.g. U1 := (—4,3), and this will be our basis. Then we extend this basis 
to a basis of the two-dimensional space R? by adding one more linearly 
independent vector, for instance we could take v2 := (1,0). This basis 
is not orthogonal or orthonormal, but we can apply the Gram-Schmidt 
process to make it orthogonal: 


(v2, W1) =A 9 12 


= (1,0)-—(—4,3) =(=, = 


3 


Wy HS Uy = (—4, as Wg := V2— 
We can then normalize: 
4 3 3.4 
wy = w,/||wil| = (=s, 5) Wy = W2/||wal| = (Fr, 5): 


Thus wi} is an orthonormal basis for W, w is an orthonormal basis for 
W+, and (wy, w2) is an orthonormal basis for R?. (Note that we could 
skip the normalization step at the end if one only wanted an orthogonal 
basis for these spaces, as opposed to an orthonormal basis). 


e Example. Let’s give P;(R) the inner product 


(Fg) = / Hoga) de 


The space P:(R) contains P,(R) as a subspace. Suppose we wish to 
compute the orthogonal complement of P,(R). We begin by taking a 
basis of P,(R) - let’s use the standard basis (1,2), and then extend it 
to a basis of P;(R) - e.g. (1, 2,27). We then apply the Gram-Schmidt 
orthogonalization procedure: 


Iw. ||? I|w2||? 
2h3 0 
ggg 
We can then normalize: 
1 


wt = W1/||\wil| = 


v2 
V3 


W) '= Wa/||wal| = Taz 


/ 
W3 = w3/||wsl| = x 
3 = w3/||wsl| 
V8 
Thus W+ has w as an orthonormal basis. Or one can just use w3 as a 
basis, so that 


Wt = {a(a? — =) :a€ R}. 


e Corollary 17 If W is a k-dimensional subspace of an n-dimensional 
inner product space V, then every vector v € V can be written in 
exactly one way as w+ u, where w € W and u € Wt. 


e Proof. By Theorem 15, we can find a orthonormal basis (w1, we, ..., Wn) 


of V such that (w1,..., wz) is an orthonormal basis of W and (wy41,.--, Wn) 


is an orthonormal basis of W+. If v is a vector in V, then we can write 
V =U, +...+AnWn 
for some scalars aj,...,Q@,. If we write 
W i= A,Wy t+... + ApWey UW i= Aggy Wee +... + AnWn 


then we have w € W,u € Wt, and v = w+ u. Now we show that this 
is the only way to decompose v in this manner. If v = w’+u’ for some 
w' e€W,u' €W+, then 


wtu=w +u 


219 


and so 
w—-w =w —u. 


But w—vw’ lies in W, and wu’ —u lies in W+. By Lemma 13, this vector 
must be 0, so that w = w’ and u = u’. Thus there is no other way to 
write v= w+u. 


We call the vector w obtained in the above manner the orthogonal 
projection of v onto W; this terminology can be made clear by a picture 
(at least in R? or R?), since w,u, and v form a right-angled triangle 
whose base w lies in W. This projection can be computed as 


W = aw, +... + apwe, = (v, wW1)wW1 +... + (VU, WE) We 
where wy ,..., Wz is any orthonormal basis of W. 


Example Let W := {(z,y) € R? : 3x +4y = 0} be as before. Suppose 
we wish to find the orthogonal projection of the vector (1,1) to W. 
Since we have an orthonormal basis given by wi := (—2, 3), we can 
compute the orthogonal projection as 


1, 43 4 3 
=A a oS = = Ss 


). 


The orthogonal projection has the “nearest neighbour” property: 


Theorem 18. Let W be a subspace of a finite-dimensional inner prod- 
uct space V, let v be a vector in V, and let w be the orthogonal pro- 
jection of v onto W. Then w is closer to v than any other element of 
W; more precisely, we have ||v — w’|| > ||v — w]| for all vectors w’ in W 
other than w. 


Proof. Write v = w+u, where w € W is the orthogonal projection of v 
onto W, and u € W+. Then we have ||v—w| = |u|]. On the other hand, 
to compute v — w’, we write v—w' = (v—w)+(w—w’) =ut+(w-w’). 
Since w, w’ lie in W, w—w’ does also. But u lies in W+, thus u L w—w!. 
By Pythagoras’s theorem we thus have 


[Jo — wl]? = [feull? + [few — wl]? > [lea? 


(since w 4 w’) and so ||v — w’|| > |/u|| = ||v — w|| as desired. 


220 


e This Theorem makes the orthogonal projection useful for approximat- 
ing a given vector v in V by a another vector in a given subspace W of 
V. 


e Example Consider the vector x? in P;(R). Suppose we want to find 
the linear polynomial az + b € P;(R) which is closest to x? (using the 
inner product on [—1,1] from the previous example to define length). 
By Theorem 18, this linear polynomial will be the orthogonal projection 
of x* to P,(R). Using the orthonormal basis w} = wor Ws = vag from 

the prior example, we thus see that this linear polynomial is 


4. - Aaya V3 1 
= + 0-7 = — + Oe. 

J2/2 V2 3 
Thus the function 1/3 + Ox is the closest linear polynomial to x? using 
the inner product on [—1,1]. (If one uses a different inner product, 
one can get a different “closest approximation”; the notion of closeness 
depends very much on how one measures length). 


(o*, wh )wi, + (v*, wh)w 


221 


Math 115A - Week 10 
Textbook sections: 3.1-5.1 
Topics covered: 


Linear functionals 

Adjoints of linear operators 
Self-adjoint operators 
Normal operators 

Stuff about the final 


OK OK OK 


Linear functionals 


Let F' be either the real or complex numbers, and let V, W be vector 
spaces over the field of scalars F’. We know what a linear transformation 
T from V to W is; it is a transformation that takes as input a vector v 
in V and returns a vector T'v in W, which preserves addition T(v+v’) = 
Tv+Tv’' and scalar multiplication T(cv) = cTv. 


We now look at some special types of linear transformation, where the 
input space V or the output space W is very small. We first look at 
what happens when the input space is just F’, the field of scalars. 


Example The linear transformation T : R > R®? defined by Tc := 
(3c, 4c, 5c) is a linear transformation from the field of scalars R to a 
vector space R?. 


Note that the above example can be written as T’c := cw, where w is 
the vector (3,4,5) in R°. The following lemma says, in fact, that all 
linear transformations from the field of scalars to another vector space 
are of this form: 


Lemma 1. Let T: F — W bea linear transformation from F' to W. 
Then there is a vector w € W such that T’c = cw for all c € F. 


222 


Proof Since c = cl, we have Tc = T(cl) = c(T1). Soif we set w := T1, 
then we have T’c = cw for allc € F. 


Now we look at what happens when the output space is the field of 
scalars. 


Definition A linear functional on a vector space V is a linear trans- 
formation T: V — F from V to the field of scalars F’. 


Thus linear functionals are in some sense the “opposite” of vectors: 
they “eat” a vector as input and spit out a scalar as output. (They are 
sometimes called covectors or dual vectors for this reason; sometimes 
physicists call them azial vectors. Another name used is 1-forms. In 
quantum mechanics, one sometimes uses Dirac’s “braket” notation, in 
which vectors are called “kets” and covectors are called “bras” ). 


Example 1. The linear transformation T : R® — R defined by 
T (x,y,z) = 3x 4+ 4y + 5z is a linear functional on R?. Another ex- 
ample is altitude: the linear transformation A : R® > R defined by 
A(x,y, 2) := z; this takes a vector in three-dimensional space as input 
and returns its altitude (the z co-ordinate). 


Example 2 (integration as a linear functional). The linear trans- 
formation I : C([0,1];R) — R defined by If := Hh f(x) dz is a linear 
functional, for instance I(x?) = 1/3. 


Example 3 (evaluation as a linear functional). The linear trans- 
formation FE : C({0,1];R) — R defined by Ef = f(0) is a linear 
functional, thus for instance E(x?) = 0, and E(e”) = 1. 


Example 4. Let V be any inner product space, and let w be any 
vector in V. Then the linear transformation T’: V — F' defined by 
Tv := (v,w) is a linear functional on V (this is because inner prod- 
uct is linear in the first variable v). For instance, the linear func- 
tional T (x,y,z) := 3x + 4y + 5z in Example 1 is of this type, since 
T (x,y,z) = ((2,y, z), (3,4, 5)); similarly the altitude function can be 
written in this form, as A(z, y, z) = ((z,y, z), (1,0,0)). Also, the inte- 
gration functional J in Example 2 is also of this form, since If = (f,1). 


223 


(As it turns out, the evaluation function F from Example 3 is not of 
this form, at least on C(([0, 1];R); but see below.) 


It turns out that on an finite-dimensional inner product space V, every 
linear functional is of the form given in the previous example: 


Riesz representation theorem. Let V be a finite-dimensional inner 
product space, and let T.: V > F be a linear functional on V. Then 
there is a vector w € V such that Tv = (v,w) for allu EV. 


Proof. Let’s say that V is n-dimensional. By the Gram-Schmidt or- 
thogonalization process we can find an orthonormal basis vj, v2,...,Un 
of V. Let v be any vector in V. From the previous week’s notes we 
have the formula 


v = (u, U1) U1 +... + (YU, Un) Un- 
Applying T to both sides, we obtain 
Tv = (v,1)Tvy +... + (0, Un) T Un. 


Since Tv1,..., Tv, are all scalars, and (v,w)c = (v,éw) for any scalar 
c, and we thus have 


Tv = (v, Tuy, +... TUnUn). 
Thus if we let w € V be the vector 


w= Tv, +... TUnvy 


then we have Tv = (v,w) for all uv € V, as desired. 


(Actually, this is only the Riesz representation theorem for finite dimen- 
sional spaces. There are more general Riesz representation theorems 
for such infinite-dimensional spaces as C'(|0, 1]; R), but they are beyond 
the scope of this course). 


Example Consider the linear functional T : C* — C defined by 
T (x,y,z) := 3a +iy+5z. From the Riesz representation theorem we 
know that there must be some vector w € C? such that Tv := (v, w) 


224 


for all v € C®. In this case we can see what w is by inspection, but let 
us pretend that we are unable to see this, and instead use the formula 
in the proof of the Riesz representation theorem. Namely, we know 
that 

Wis Toy hex Te, 


whenever v;,...,Un is an orthonormal basis for C®. Thus, using the 
standard basis (1,0,0), (0, 1,0), (0,0, 1), we obtain 


w := T(,0,0)(1,0,0) + T(0, 1, 0)(0, 1,0) + T(0, 0, 1)(0, 0, 1) 
= 3(1,0,0) + (0, 1,0) +.5(0,0, 1) = (3, -i,5). 


Thus Tv = (v, (3, —7,5)), which one can easily check is consistent with 
our definition of 7. 


More generally, we see that any linear functional T : F” — F (where 
F = R or C) can be written in the form Tv := (v,w), where w is the 
vector 


w= (Te1, Tes, fasts ens 


and e1,...,€n is the standard basis for F”. (i.e. the first component of 
w is Te,, etc. For instance, in the previous example Te, = T(1,0,0) = 
3, so the first component of w is 3 = 3. 


Example Let P2(R) be the polynomials of degree at most 2, with the 
inner product 


(f,9) = I. f(x)g(a) dx. 


Let E : P,(R) > R be the evaluation function E(f) := f(0), for 
instance E(x? + 2x + 3) = 3. From the Riesz representation theorem 
we know that E(f) = (f,w) for some w € P:(R); we now find what 
this w is. We first find an orthonormal basis for P;(R). From last 
week’s notes, we know that 


alt _ v3 ee ee 
oe — Wie U3 = VB x 


3 
is an orthonormal basis for P;(R). Thus we can compute w using the 
formula 


Us = 


w = Tv v1 + Tveve + Tv303 


225 


from the proof of the Riesz representation theorem. Since Tv, = 


V2? 
Tv. = 0, and Tv3 = v@(—3), we thus have 


Ped 4/45 A/a. §. 


w= = y= 
which simplifies to 
a! 5 9 ae 


It may seem that the vector w that is obtained by the Riesz representa- 
tion theorem would depend on which orthonormal basis v1,...,v, one 
chooses for V. But it turns out that this is not the case: 


Lemma 2. Let 7’: V > R be a linear functional on an inner product 
space V. Then there can be at most one vector w € V with the property 
that Tv = (v,w) for allue V. 


Proof. Suppose for contradiction that there were at least two different 
vectors w, w’ in V such that Tv = (v,w) and Tv = (v, w’) for allu € V. 
Then we have 


(v,w—w’) = (v,w) — (v,w') = Tu —-Tv = 0 


for all v € V. In particular, if we apply this identity to the vector 
v:= w-—w' we obtain 


|| — w’||? = (w — w’,w —w’) = 0 


which implies that w—w’ = 0, so that w and w’ are not different after 
all. This contradiction shows that there could only have been one such 
vector w to begin with, as desired. 


Another way to view Lemma 2 is the following: if (v,w) = (v,w’) for 
all v € V, then w and w’ must be equal. (If you like, this is sort of 
like being able to “cancel” v from both sides of an identity involving 
an inner product, provided that you know the identity holds for all v). 


226 


OK OK OK 


Adjoints 


e The Riesz representation theorem allows us to turn linear functionals 
T : V + R into vectors w € V, if V is a finite-dimensional inner 
product space. This leads us to a useful notion, that of the adjoint of 
a linear operator. 


e Let 7: V > W be a linear transformation from one inner product 
space to another. Then for every vector w € W, we can define a linear 
functional T,, : V + R by the formula 


Eg =F oy: 


e Example. If T : R® > R? is the linear transformation 
T(x, y, 2) = (@ + 2y + 82, 4x + 5y + 62) 


and w was the vector (10,1) € R’, then T,, : R? + R would be the 
linear functional 


Tw (x,y, 2) = (a+ 2y + 3z, 4x + 5y + 6z), (10,1)) = 14x + 25y + 362. 


e One can easily check that J), is indeed a linear functional on V: 
T,(ut+u') = (T(v+v’),w) = (Tv+Tv',w) = (Tv, w)+(Tv',w) = Ty, vt Ty’ 
Tylev) = 1 (ev), w).= tel vu, w) = 68a, w)=-cFu: 


e By the Riesz representation theorem, there must be a vector, called 
T*w € V, such that T,,v = (v,T*w) for all v € V, or in other words 
that 

(Tv, w) = (v,T*w) 


for all w € W and v € V; this is probably the most basic property of 
T*. Note that by Lemma 2, there can only be one possible value for 
T*w for each w. 


227 


Example Continuing the previous example, we see that 
Pils Y, z) — (Ce; Y, 2) (14, 25, 36)) 
and hence by Lemma 2, the only possible choice for T*w is 


T*(10, 1) = T*w = (14, 25, 36). 


Note that while 7’ is a transformation that turns a vector v in V to a 
vector Tv in W, T* does the opposite, starting with a vector w in W as 
input and returning a vector 7*w in V as output. This seems similar to 
how an inverse T~! of T would work, but it is important to emphasize 
that 7* is not the inverse of TJ’, and it makes sense even when T is not 
invertible. 


We refer to T* : W — V as the adjoint of T. Thus when we move an 
operator T from one side of an inner product to another, we have to 
replace it with its adjoint. This is similar to how when one moves a 
scalar from one side of an inner product to another, you have to replace 
it by its complex conjugate: (cu, w) = (v, tw). Thus the adjoint is like 
the complex conjugate, but for linear transformations rather than for 
scalars. 


Lemma 3. If T': V > W is a linear transformation, then its adjoint 
T* : W — V is also a linear transformation. 


Proof. We have to prove that T*(w+w’) = T*w+T*w’ and T*(cw) = 
cT*w for all w,w’ € W and scalars c. 


First we prove that T*(w+w’) =T*w+T*w’'. By definition of T*, we 
have 
(uv, T*(w+w’)) = (Tv,w+w’) 


for allu € V. But 
(Tv, w+w’) = (Tv, w)+(Tv, w’) = (v, T*w) +0, T*w’) = (v,T*wtT*w’). 
Thus we have 
(uv, T*(w+w’)) = (vu, T*w + T*w') 
for allv € V. By Lemma 2, we must therefore have T*w + T*w’ = 


T*(w + w’) as desired. 


228 


e Now we show that T*(cw) = cT*w. We have 
oT (cw) = ben) = co; 0) = co, T wy = pcr w) 


for all v € V. By Lemma 2, we thus have T*(cw) = cT*w as desired. 
L 


e Example Let us continue our previous example of the linear transfor- 
mation T : R? + R? defined by 


T(x, y, 2) = (& + 2y + 3z, 4a + 5y + 62). 


Let us work out what T* : R? > R? is. Let (a,b) be any vector in R?. 
Then we have 


(T@, Y, z), (a, b)) = ((z, Yy; z), I™(a, b)) 
for all (x,y,z) € R®. The left-hand side is 


((a + 2y + 32, 4x + 5y + 62), (a, b)) = a(x + 2y + 3z) + b(4x + 5y + 6z) 
= (a+4b)r+ (2a+5b)y+(3a+6b)z = ((2,y, z), (at+4b, 2a+5b, 3a+6b)). 


Thus we have 
(x,y, z), (a + 46, 2a + 5b, 3a + 6b)) = (2, y, z), T* (a, b)) 
for all x,y, z; by Lemma 2, this implies that 


T*(a, b) = (a + 4b, 2a + 5b, 3a + 60). 


e This example was rather tedious to compute. However, things become 
easier with the aid of orthonormal bases. Recall (from Corollary 7 of 
last week’s notes) that if v is a vector in V and 6 := (v1,...,Un) is an 
orthonormal basis of V, then the column vector [v]* is given by 


(v, U1) 
w= |: 


(v, Un) 


Thus the i’ row entry of [vu]? is just (v, v4). 


229 


Now suppose that T: V — W is a linear transformation, and (~ := 
(vV1,-.-,Un) is an orthonormal basis of V and ¥ := (wi,...,Wm) is an 
orthonormal basis of W. Then [7] is a matrix with m rows and n 
columns, whose j“” column is given by [T'v,]*. In other words, we have 


is ae ee oo 
ry, = (Tv1,W2)  (Tv2, we) . (TUn, We) 
(Pup Way LEO): caw ALO aeahy, ) 


In other words, the entry in the i” row and j‘ column is (T'v;, wi). 


We can apply similar reasoning to the linear transformation T* : W — 
V. Then [T*]? is a matrix with n rows and m columns, and the entry 
in the i” row and j column is (T*w,,v;). But 


(T*w;, vi) = re OP = CEUs, Wie) 


Thus, the matrix aul is given by 


(Tv,,w1) (Tui, we) ... (Lv1,Wm) 
ir*|2 7 (Tv2,w1) (Tv, we) ie (Tv2, Wm) 
(TUn,W1) (Tun, We) | ian Opie 


Comparing this with our formula for [T]}, we see that [T*]° is the adjoint 
of [T]: 


Theorem 3. If 7: V > W is a linear transformation, ( is an or- 
thonormal basis of V, and y is an orthonormal basis of W, then 


iT*]8 = (IrTpyt. 


Example Let us once again take the example of the linear transfor- 
mation T : R? > R? defined by 


T(x, y, 2) = (@ + 2y + 3z, 4a + 5y + 62). 


230 


Let 3 := ((1,0,0), (0,1, 0), (0,0,1)) be the standard basis of R°, and 
let y := ((1,0),(0,1)) be the standard basis of R?. Then we have 


(why?). On the other hand, if we write the linear transformation 
T* (a,b) = (a + 4b, 2a + 5b, 3a + 6b) 


in matrix form, we see that 
1 4 
ae 2. oe 
3 6 


which is the adjoint of [T]3- (In this example, the field of scalars is real, 
and so the complex conjugation aspect of the adjoint does not make an 
appearance. 


The following corollary connects the notion of adjoint of a linear trans- 
formation with that of adjoint of a matric. 


Corollary 4. Let A be an m x n matrix with either real or complex 
entries. Then the adjoint of Ly is Ly;. 


Proof. Let F' be the field of scalars that the entries of A lie in. Then 
L, is a linear transformation from F' to F”, and L 4; is a linear trans- 
formation from F” to F'™. If we let 6 be the standard basis of F’” and 
y be the standard basis of F”, then by Theorem 3 


[L4)5 = ([La]g)' = At = [Lar] 


and hence L* = Ly; as desired. 
In particular, we see that 
(Av, w) = (v, Alw) 


for any m xX n matrix A, any column vector v of length n, and any 
column vector w of length m. 


231 


e Example Let A be the matrix 


li 0 
A=(4 1+i a 


so that Ly : C? > C? is the linear transformation defined by 


‘ i Zo 129 
Lal 2 | =Al 2%] = ( a eee 


Then the adjoint of this transformation is given by L4+, where A? is 
the adjoint of A: 


ty 30 
At=|] -i 1-1 |, 
0 3 
SO 
w w — 
La ( 1) =a( ‘}= —iw, + (1 —i)w2 
W2 W2 


3W3 


e Some basic properties of adjoints. Firstly, the process of taking adjoints 
is conjugate linear: if 7: V > W and U: V — W are linear transfor- 
mations, and c is a scalar, then (T+ U)* = T* + U* and (cT)* =I”. 
Let’s just prove the second claim, as the first is similar (or can be found 
in the textbook). We look at the expression (v, (cT’)*w) for any v € V 
and w € W, and compute: 


(OSCE) th) = Cele, toy 6C7 nab) co, Ty) 0, 


Since this identity is true for all v € V, we thus have (by Lemma 2) 
that (cT)*w = cT*w for all w € W, and so (cT)* = CI™ as desired. 


e This argument shows a key trick in understanding adjoints: in order to 
understand a transformation T or its adjoint, it is often a good idea to 
start by looking at the expression (Tv, w) = (v,7T*w) and rewrite it in 
some other way. 


232 


e Some other properties, which we leave as exercises: (T*)* = T (i.e. 
if T* is the adjoint of T, then T is the adjoint of T*); the adjoint of 
the identity operator is again the identity; and if T : V — W and 
S:U — V are linear transformations, then (7'S)* = S*7*. (This last 
identity can be verified by playing around with (u, S*7*w) for u € U 
and w € W). If T is invertible, we also have (T~')* = (T*)~! (ie. the 
inverse of the adjoint is the adjoint of the inverse). This can be seen 
by starting with the identity TT~'! = T~'T = J and taking adjoints of 
all sides. 


e Another useful property is that a matrix has the same rank as its 
adjoint. To see this, recall that the adjoint of a matrix is the conjugate 
of its transpose. From Lemma 7 of week 6 notes, we know that a 
matrix has the same rank as its transpose. It is also easy to see that 
a matrix has the same rank as its conjugate (this is basically because 
the conjugate of an elementary matrix is again an elementary matrix, 
and the conjugate of a matrix in row-echelon form is again a matrix in 
row echelon form.) Combining these two observations we see that the 
adjoint of a matrix must also have the same rank. From Theorem 3 
(and Lemma 9 of week 6 notes) we see therefore that a linear operator 
from one finite-dimensional inner product space to another has the 
same rank as its adjoint. 


e In a similar vein, if A is a square matrix with determinant d, then A* 
will have determinant d. (We will only sketch a proof of this fact here: 
first prove it for elementary matrices, and for diagonal matrices. Then 
to handle the general case, use Proposition 5 from week 6 notes, as well 
as the identity (BA)! = AB"). 


OK OK OK 


Normal operators 


e Recall that in the Week 7 notes we discussed the problem of whether a 
linear transformation was diagonalizable, i.e. whether it had a basis of 
eigenvectors. We did not fully resolve this question, and in fact we will 
not be able to give a truly satisfactory answer to this question until 
Math 115B. However, there is a special class of linear transformations 
(aka operators) for which we can give a good answer - normal operators. 


233 


e Definition Let T: V > V be a linear transformation on V, so that 
the adjoint T* : V — V is another linear transformation on V. We say 
that T is normal if TT* = T*T. 


e Example 1 Let T: R? > R? be the linear transformation T(z, y) := 
(y, x). Then T* : R? + R? can be computed to be the linear trans- 
formation T* (x,y) = (—y, x) (why?), and so 


TE" G54) = 1 (ey Hay) 


and 
T"T (x,y) = T"(y, =£) = (x,y). 


Thus TT*(x, y) and T*T (2, y) agree for all (x,y) € R?, which implies 
that TT* = T*T. Thus this transformation is normal. 


e Example 2 Let T : R? — R? be the linear transformation T(z, y) := 
(0,2). Then T* (x, y) = (y,0) (why?). So 


TT (ag) =T (0 = (09) 


and 
LP G4) ST" (0.2) = (0): 


So in general TT* (x, y) and T*T(z, y) are not equal, andsoTT* 4 T*T. 
Thus this transformation is not normal. 


e In analogy to the above definition, we define a square matrix A to be 
normal if AAt = ATA. For instance, the matrix 


0 1 
-—1 0 
can easily be checked to be normal, while the matrix 
0 0 
iQ 
is not. (Why do these two examples correspond to Examples 1 and 2 


above?) 


234 


Another example, easily checked: every diagonal matrix is normal. 
From Theorem 3 we have 
Proposition 5. Let 7: V > V bea linear transformation on a finite- 


dimensional inner product space, and let 3 be an orthonormal basis. 
Then T': V — V is normal if and only if the matrix bale is. 


Proof. If 7 is normal, then T7* = 7*T. Now taking matrices with 
respect to 6, we obtain 


TIT ls = (TNs I5- 


But by Theorem 3, ual is the adjoint of wae Thus fal is normal. 
This proves the “only if” portion of the Proposition; the “if” part 
follows by reversing the above steps. 


Normal transformations have several nice properties. First of all, when 
T is normal then T and T* will have the same eigenvectors (but slightly 
different eigenvalues): 


Lemma 6. Let T': V + V be normal, and suppose that T’v = Av for 
some vector v € V and some scalar A. Then T*v = Av. 


Warning: the above lemma is only true for normal operators! For other 
linear transformations, it is quite possible that 7’ and 7* have totally 
different eigenvectors and eigenvalues. 


Proof To show T*v = 1, it suffices to show that ||T*v — Av|| = 0, 
which in turn will follow if we can show that 


(T*v — du, T*v — Av) = 0. 
We expand out the left-hand side as 
(T*y, T*v) — (vu, T*v) — (T*v, Av) + Qu, dv). 
Pulling the As out and swapping the 7's over, this becomes 


(vu, TT*v) — MT, v) — A(v, Tv) + AX, v). 


235 


Since T is normal and Tv = Xv, we have T*v = T*Tv = XT*v. Thus 
we can rewrite this expression as 


Av, T*v) — ANv, v) — AAv, v) + AN, v). 


But (v,T*v) = (Tv, v) = A(v, v). If we insert this in the above expres- 
sion we then see that everything cancels to zero, as desired. 


Lemma 7. Let 7: V — V be normal, and let v1, vo be two eigen- 
vectors of T with distinct eigenvalues Aj, Az. Then v, and v2 must be 
orthogonal. 


(Compare this with Proposition 6 of the Week 8 notes, which merely 
asserts that these vectors v; and v2 are linearly independent. Again, we 
caution that this orthogonality of eigenvectors is only true for normal 
operators.) 


Proof. We have Tv, = Avy and Tv2 = Avo. By Lemma 6 we thus 
have T*v, = Ayv1 and T*vg = Aove. Thus 


Ai (U4, V2) = (Tv, V2) = (v1, T* v2) = (v1, Azv2) = Ag (U1, V2). 


Since Ay; # Ag, this means that (v;,v2) = 0, and so v; and v2 are 
orthogonal as desired. 


This lemma tells us that most linear transformations will not be normal, 
because in general the eigenvectors corresponding to different eigenval- 
ues will not be orthogonal. (Take for instance the matrix involved in 
the Fibonacci rabbit example). 


In the other direction, if we have an orthonormal basis of eigenvectors, 
then the transformation must be normal: 


Lemma 8. Let T’: V > V be a linear transformation o an inner 
product space V, and let @ be an orthonormal basis which consists 
entirely of eigenvectors of 7. Then T is normal. 


Compare this lemma to Lemma 2 of Week 7 notes, which sais that if 
you have a basis of eigenvectors (not necessarily orthonormal), then T 
is diagonalizable. 


236 


Proof. From Lemma 2 of Week 7 notes, we know that the matrix |[T’ if 
is diagonal. But all diagonal matrices are normal (why?), and so |[T’ i 
is normal. By Proposition 5 we thus see that 7’ is normal. 


We now come to an important theorem, that the converse of Lemma 8 
is also true: 


Spectral theorem for normal operators Let T :V — V bea 
normal linear transformation on a complex finite dimensional inner 
product space V. Then there is an orthonormal basis @ consisting 
entirely of eigenvectors of 7’. In particular, T is diagonalizable. 


Thus normal linear transformations are precisely those diagonalizable 
linear transformations which can be diagonalized using orthonormal 
bases (as opposed to just being plain diagonalizable, using bases which 
might not be orthonormal). 


There is also a spectral theorem for normal operators on infinite dimen- 
sional inner product spaces, but it is beyond the scope of this course. 


Proof Let the dimension of V be n. We shall prove this theorem by 
induction on n. 


First consider the base case n = 1. Then one can pick any orthonormal 
basis 8 of V (which in this case will just be a single unit vector), and the 
vector v in this basis will automatically be an eigenvector of T’ (because 
in a one-dimensional space every vector will be a scalar multiple of v). 
So the spectral theorem is trivially true when n = 1. 


Now suppose inductively that n > 1, and that the theorem has already 
been proven for dimension n — 1. Let f(A) be the characteristic poly- 
nomial of T (or of any matrix representation [T’ 1 of T; recall that any 
two such matrix representations are similar and thus have the same 
characteristic polynomial). From the fundamental theorem of algebra, 
we know that this characteristic polynomial splits over the complex 
numbers. Hence there must be at least one root of this polynomial, 
and hence T has at least one (complex) eigenvalue, and hence at least 
one eigenvector. 


237 


e So now let us pick an eigenvector v; of T’ with eigenvalue A,, thus 
Tv, = \yv; and T*v; = \,v; by Lemma 6. We can normalize v; to have 
length 1, so ||v;|| = 1 (remember that if you multiply an eigenvector by 
a non-zero scalar you still get an eigenvector, so it’s safe to normalize 
eigenvectors). Let W := {cv : c € C} denote the span of this eigenvec- 
tor, thus W is a one-dimensional space. Let Wt := {uv EV: uv L v4} 
denote the orthogonal complement of W; this is thus an n — 1 dimen- 
sional space. 


e Now we see what T and T* do to W+. Let w be any vector in Wt, 
thus w L v1, i.e. (w,vi) = 0. Then 


(Tw, v1) = (w,T*v1) = (w, A101) Zz A1(w, V1) =0 
and similarly 
(T*w,v1) = (w,Tv,) = (w, x01) = A (w, v1) = 0. 


Thus if w € W+, then Tw and T*w are also in Wt. Thus T and T* 
are not only linear transformations from V to V, they are also linear 
transformations from W+ to W+. Also, we have 


(Tw,w') = (w,T*w’) 


for all w,w’ € W+, because every vector in W+ is a vector in V, and 
we already have this property for vectors in V. Thus T’ and 7™ are still 
adjoints of each other even after we restrict the vector space from the 
n-dimensional space V to the n — 1-dimensional space W+. 


e We now apply the induction hypothesis, and find that W+ enjoys an 
orthonormal basis of eigenvectors of T. There are n — 1 such eigen- 
vectors, since W+ is n — 1 dimensional. Now v is normalized and is 
orthogonal to all the vectors in this basis, since v; lies in W and all the 
other vectors lie in W+. Thus if we add v, to this basis we get a new 
collection of n orthonormal vectors, which automatically form a basis 
by Corollary 5 of Week 9 notes. Each of these vectors is an eigenvector 
of T,, and so we are done. 


e Example The linear transformation T : R? > R? defined by T(z, y) := 
(y, —x) that we discussed earlier is normal, but not diagonalizable (its 


238 


characteristic polynomial is \? + 1, which doesn’t split over the reals). 
This does not contradict the spectral theorem because that only con- 
cerns complex inner product spaces. If however we consider the com- 
plex linear transformation T' : C? + C? defined by T(z, w) := (w, —2z), 
then we can find an orthonormal basis of eigenvectors, namely 
UbI= sgltsi U2 = 22%) 

(Exercise: cover up the above line and see if you can find these eigenvec- 
tors on your own). Indeed, you can check that v, and v2 are orthonor- 
mal, and that Tv, = iv; and T'vg = —ivg. Thus we can diagonalize T 
using an orthonormal basis, to become the diagonal matrix diag(7, —7). 


KOK OK OK 


Self-adjoint operators 


e Tosummarize the previous section: in the world of complex inner prod- 
uct spaces, normal linear transformations (aka normal operators) are 
the best kind of linear transformations: they are not only diagonaliz- 
able, but they are diagonalizable using the best kind of basis, namely 
an orthonormal basis. However, there is a subclass of normal transfor- 
mations which are even better: the self-adjoint transformations. 


e Definition A linear transformation T : V > V on a finite-dimensional 
inner product space V is said to be self-adjoint if T* = T, i.e. T is its 
own adjoint. A square matrix A is said to be self-adjoint if At = A, 
i.e. A is its own adjoint. 


e Example. The linear transformation T : R? > R? defined by T(x, y) := 
(y, -x) is normal, but not self-adjoint, because its adjoint T*(x,y) = 
(—y,x) is not the same as T. However, the linear transformation 
T : R’? = R’ defined by T(x, y) = (y,2) is self-adjoint, because its 
adjoint is given by T*(z, y) = (y,x) (why?), and this is the same as T. 


a= (2,5) 


239 


e Example. The matrix 


is normal, but not self-adjoint, because its adjoint 


0 -l 
= 
PAL) 


is not the same as A. However, the matrix 


is the same as A. (Why does this example correspond to the preceding 
one? It is easy to check, using Proposition 5, that a linear transforma- 
tion is self-adjoint if and only if its matrix in some orthonormal basis 
is self-adjoint). 


Example. Every real diagonal matrix is self-adjoint, but any other 
type of diagonal matrix is not (e.g. diag(2 +7,4-+ 32) has an adjoint of 
diag (2—7, 4—32) and is hence not self-adjoint, though it is still normal). 


It is clear that all self-adjoint linear transformations are normal, since 
if T* = T then T*T and TT™ are both equal to T? and are hence equal 
to each other. Similarly, every self-adjoint matrix is normal. However, 
not every normal matrix is self-adjoint, and not every normal linear 
transformation is self-adjoint; see the above examples. 


A self-adjoint transformation over a complex inner product space is 
sometimes known as a Hermitian transformation. A self-adjoint trans- 
formation over a real inner product space is known as a symmetric 
transformation. Similarly, a complex self-adjoint matrix is known as a 
Hermitian matrix, while a real self-adjoint matrix is known as a sym- 
metric matrix. (A matrix is symmetric if A‘ = A. When the matrix 
is real, the transpose A’ is the same as the adjoint, thus self-adjoint 
and symmetric have the same meaning for real matrices, but not for 
complex matrices). 


240 


Example The matrix 


a=(2,5) 


is its own adjoint (why?), and is hence Hermitian, but it is not symmet- 
ric, since it is not its own transpose. Note that every real symmetric 
matrix is automatically Hermitian, because every real matrix is also a 
complex matrix (with all the imaginary parts equal to 0). 


From the spectral theorem for normal matrices, we know that any Her- 
mitian operator on a complex inner product space has an orthonormal 
basis of eigenvectors. But we can say a little bit more: 


Theorem 9 All the eigenvalues of a Hermitian operator are real. 


Proof. Let \ be an eigenvalue of a Hermitian operator T’, thus T’v = Av 
for some non-zero eigenvector v. But then by Lemma 6, T*v = Av. But 
since T' is Hermitian, T = T*, and hence Av = Av. Since v is non-zero, 
this means that \ = X, i.e. A is real. Thus all the eigenvalues of T are 
real. 


A similar line of reasoning shows that all the eigenvalues of a Hermitian 
matrix are real. 


Corollary 10. The characteristic polynomial of a Hermitian matrix 
splits over the reals. 


Proof. We know already from the Fundamental Theorem of Algebra 
that the characteristic polynomial splits over the complex numbers. 
But since the matrix is Hermitian, every root of the characteristic poly- 
nomial must be real. Thus the polynomial must split over the reals. 
O 


We can now prove 


Spectral theorem for self-adjoint operators Let T be a self-adjoint 
linear transformation on an inner product space V (which can be ei- 
ther real or complex). Then there is an orthonormal basis of V which 
consists entirely of eigenvectors of V, with real eigenvalues. 


241 


e Proof. We repeat the proof of the Spectral theorem for normal opera- 
tors, i.e. we do an induction on the dimension n of the space V. When 
n = 1 the claim is again trivial (and we use the fact that every Lemma 
9 to make sure the eigenvalue is real). Now suppose inductively that 
n > 1 and the claim has already been proven for n — 1. 


From Corollary 10 we know that 7’ has at least one real eigenvalue. 
Thus we can find a real A, and a non-zero vector v; such that Tv; = 
A,v,. We can then normalize v; to have unit length. We now repeat 
the rest of the proof of the spectral theorem for normal operators, to 
obtain the same conclusion except that the eigenvalues are now real. 
O 


e Notice one subtle difference between the spectral theorem for self- 
adjoint operators and the spectral theorem for normal operators: the 
spectral theorem for normal operators requires the inner product space 
to be complex, but the one for self-adjoint operators does not. In partic- 
ular, every symmetric operator on a real vector space is diagonalizable. 


4=(%5) 


is Hermitian, and thus so is the linear transformation L4 : C? > C?, 


which is given by 
Z tw 
(3 )- (52): 


By the spectral theorem, C? must have an orthonormal basis of eigen- 
vectors with real eigenvalues. One such basis is 


m= (ha): o= (ihe) 


one can verify that v; and v2 are an orthonormal basis for the complex 
two-dimensional inner product space C?, and that L4v; = —v, and 
Davy = +v2. Thus L4 can be diagonalized using an orthonormal basis 
to give the matrix diag(+1,—1). Note that while the eigenvalues of 
La are real, the eigenvectors are still complex. The spectral theorem 
says nothing as to how real or complex the eigenvectors are (indeed, 


e Example The matrix 


242 


in many inner product spaces, such a question does not really make 
sense). 


Self-adjoint operators are thus the very best of all operators: not only 
are they diagonalizable, with an orthonormal basis of eigenvectors, the 
eigenvalues are also real. (Conversely, it is easy to modify Lemma 8 to 
show that any operator with these properties is necessarily self-adjoint). 
Fortunately, self-adjoint operators come up all over the place in real 
life. For instance, in quantum mechanics, almost all the linear trans- 
formations one sees there are Hermitian (this is basically because while 
quantum mechanics uses complex inner product spaces, the quantities 
we can actually observe in physical reality must be real-valued). 


243 


Assignment 1 Due October 10 Covers: Sections 1.1-1.6 
e Ql. Let V be the space of real 3-tuples 
V = {(21, £2, 23) : £1, £2,723 € R} 
with the standard addition rule 
(21, 2,23) + (1, Yo, Ys) = (41 + Y1, F2 + Yo, Z3 + ys) 
but with the non-standard scalar multiplication rule 
tice Wa) = (CL pts) 


(In other words, V is the same thing as the vector space R?, but with 
the scalar multiplication law changed so that the scalar only multiplies 
the first co-ordinate of the vector.) 


Show that V is not a vector space. 

e Q2. (a) Find a subset of R? which is closed under scalar multiplication, 
but is not closed under vector addition. 
(b) Find a subset of R® which is closed under vector addition, but not 


under scalar multiplication. 


e Q3. Find three distinct non-zero vectors u, v, w in R? such that span({u, v}) = 
span({v,w}) = span({u, v, w}), but such that span({u, w}) ¢ span({u, v, w}). 


e Q4. Find a basis for M3...(R), the vector space of 2 x 2 matrices with 
trace zero. Explain why the set you chose is indeed a basis. 


e Q5. Do Exercise 1(abghk) of Section 1.2 in the textbook. 
e Q6. Do Exercise 8(aef) of Section 1.2 in the textbook. 
e Q7. Do Exercise 23 of Section 1.3 in the textbook. 


e Q8*. Do Exercise 19 of Section 1.3 in the textbook. [Hint: Prove 
by contradiction. If W; Z We, then there must be a vector w, which 
lies in W, but not in W9. Similarly, if W. Z Wj, then there must be 
a vector W2 which lies in Wz but not in W,. Now suppose that both 
W, Z W2 and W2 Z W,, and consider what one can say about the 
vector w1 + w2.| 


244 


e Q9. Do Exercise 4(a) of Section 1.4 in the textbook. 


e Q10. Do Exercise 1(abdef) of Section 1.5 in the textbook. 


245 


Assignment 2 Due October 17 Covers: Sections 1.6-2.1 
e Ql. Do Exercise 1(acdejk) of Section 1.6 in the textbook. 


e Q2. Do Exercise 3(b) of Section 1.6 in the textbook. 
e Q3. Do Exercise 7 of Section 2.1 in the textbook. 
e 4. Do Exercise 9 of Section 2.1 in the textbook. 


e Q5. Find a polynomial f(x) of degree at most three, such that f(n) = 
2° forall a=, 1,2,3: 


e Q6. Let V bea vector space, and let A, B be two subsets of V. Suppose 
that B spans V, and that span(A) contains B. Show that A spans V. 


e Q7*. Let V be a vector space which is spanned by a finite set S of 
n elements. Show that V is finite dimensional, with dimension less 
than or equal to n. [Note: You cannot apply the Dimension Theorem 
directly, because we have not assumed that V is finite dimensional. To 
do that, we must first construct a finite basis for V; this can be done 
by modifying the proof of part (g) of the Dimension theorem, or the 
proof of Theorem 2.| 


e Q8*. Show that F(R,R), the space of functions from R to R, is 


infinite-dimensional. 


e Q9. Let V be a vector space of dimension 5, and let W be a subspace of 
V of dimension 3. Show that there exists a vector space U of dimension 
4 such that W CUCY. 


e Q10) Leta =", 0,0)) v9 (0,.1,0), 03 = (0,01); oy (1 1, 0)-be 
four vectors in R®, and let S$ denote the set S := {v1, v2, v3, v4}. The set 
S has 16 subsets, which are depicted on the reverse of this assignment. 
(This graph, incidentally, depicts (the shadow of) a tesseract, or 4- 
dimensional cube). 


e Of these subsets, which ones span R*? which ones are linearly inde- 
pendent? Which ones are bases? (Feel free to color in the graph and 
turn it in with your assignment. You may find Corollary 1 in the Week 
2 notes handy). 


246 


Assignment 3 Due October 24 Covers: Sections 2.1-2.3 

e Ql. Do Exercise 1(cdfh) of Section 2.1 of the textbook. 
e 2. Do Exercise 7 of Section 2.1 of the textbook. 

e Q3. Do Exercise 10 of Section 2.1 of the textbook. 

e Q4*. Do Exercise 17 of Section 2.1 of the textbook. 

e Q5. Do Exercise 1(bedf) of Section 2.2 of the textbook. 
e Q6. Do Exercise 2(aceg) of Section 2.2 of the textbook. 
e Q7. Do Exercise 7 of Section 2.2 of the textbook. 


e Q8. (a) Let V, W be vector spaces, and let T : V > W be a linear 
transformation. Let U be a subspace of W. Show that the set 


TU) :={vEV:T(v) €U} 


is a subspace of V. Explain why this shows that the null space N(T) 
is also a subspace. 


e (b) Let V, W be vector spaces, and let T.: V + W bea linear trans- 
formation. Let X be a subspace of V. Show that the set 


D(X) a= AT ose Se Xx | 


is a subspace of W. Explain why this shows that the range R(T) is 
also a subspace. 


e Q9*. Show, without doing Gaussian elimination or any other compu- 
tation, that there must be a solution to the system 


1274 +3429 +5623 +7824 = 0 
321 +622 +2273 +1024 = 0 
4321 +2125 +9823 +7624 = 0 


such that the x1, %2, 73,24 are not all equal to zero. [Hint: consider 
the linear transformation T : R* > R® defined by 


247 


56x23 + 7824, 
3x, + 6x2 + 2xq + 1024, 43271 + 21x + 9823 + 7624). 


T (21, L2, £3, La) = (122, 3429 


What can you say about the rank and nullity of T? 


e Q10. Find a non-zero vector v € R?, and two ordered bases £, 6’ of 
R?, such that [v]¢ = [v]g- but that 8 4 (’. 


248 


Assignment 4 Due October 31 Covers: Sections 2.3-2.4 


Ql. Do Exercise 5(cdefg) of Section 2.2 of the textbook. 
Q2. Do Exercise 1(aegij) of Section 2.3 of the textbook. 
Q3. Do Exercise 4(c) of Section 2.3 of the textbook. 


Q4. Do Exercise 10 of Section 2.3 at the textbook. (Zo is the zero 
transformation, so that 7ov = 0 for all vu € V. 


Q5. Do Exercise 1(bedefhi) of Section 2.4 of the textbook. 

Q6. Do Exercise 2 of Section 2.4 of the textbook. 

Q7. Do Exercise 4 of Section 2.4 of the textbook. 

Q8*. Do Exercise 9 of Section 2.4 of the textbook. 

Q9. Let U, V, W be vector spaces. 

(a) Show that U is isomorphic to U. 

(b) Show that if U is isomorphic to V, then V is isomorphic to U. 


(c) Show that if U is isomorphic to V, and V is isomorphic to W, then 
U is isomorphic to W. 


(Incidentally, the above three properties (a)-(c) together mean that 
isomorphism is an equivalence relation). 


Q10. From our notes on Lagrange interpolation, we know that given 
any three numbers y1, yo, y3, there exists an interpolating polynomial 
f € P2(R) such that f(0) = yi, fC) = ye, and f(2) = y3. Define the 
map T : R? > P,(R) by setting T(y1, y2, y3) := f. (Thus for instance 
T(0,1,4) = x7). Let a := ((1,0,0), (0,1,0), (0,0,1)) be the standard 
basis for R°, and let 8 := (1, x, x”) be the standard basis for P:(R). 


(a) Compute the matrix [T]2. (You may assume without proof that T 
is linear.) 


249 


e (b) Let S : P,(R) > R? be the map 
Sf := (F(0), FQ), F(2)). 


Compute the matrix [S]%. (Again, you may assume without proof that 
S is linear). 


e (c) Use matrix multiplication to verify the identities 


[SlpTla = (TlalSlg = 4s, 


a 


where J3 is the 3 x 3 identity matrix. Can you explain why these 
identities should be true? 


250 


Midterm 
e Ql. Let T : R* > R’ be the transformation 


E (Lis Da03, La) = (0, 21,2, £3). 


e (a) What is the rank and nullity of T? 


e (b) Let 6 := ((1,0,0,0), (0, 1,0,0), (0,0, 1,0), (0,0,0,1)) be the stan- 
dard ordered basis for T. Compute le rage bees and tee (Here 
T= ToT, ToS Ter oT. ete) 


e 2. Let V denote the space 
V = {f € P3(R) : f(0) = f(1) = O}. 


(a) Show that V is a vector space. 


(b) Find a basis for V. (Hint: if f(0) = f(1) = 0, what can one say 
about the factors of f?) 


e Q3. Let V and W be vector spaces, and let T.: V > W be a one-to- 
one linear transformation. Let U be a finite-dimensional subspace of 
V. Show that the vector space 


HO) SHS Toer et } 


has the same dimension as U. (You may assume without proof that 
T(U) is a vector space). 


e 4. Let V be a three-dimensional vector space with an ordered basis 
B := (v1, v2, v3). Let 7 be the ordered basis y := ((1, 1,0), (1, 0,0), (0, 0, 1)) 
of R°. (You may assume without proof that ¥ is indeed an ordered ba- 
sis). 


e Let 7: V > R? be alinear transformation whose matrix representation 
[T]3 is given by 


0 0 1 
(T3=| 0 1 0 
1 0 0 
Compute T(v1 + 2v2 + 3v3). 


251 


e Q5. Find a linear transformation T : R? + R? whose null space N(T) 
is equal to the z-axis 


N(T) = {(0,0,z):2z¢€R} 
and whose range R(T) is equal to the plane 


R(T) = {(2,y,z) € RP: a +y+z=0}. 


252 


Assignment 5 Due November 7 Covers: Sections 2.4-2.5 


e Ql. Do exercise 15(b) of Section 2.4 of the textbook. (Note that part 
(a) of this exercise was already done in Q8(b) of Assignment 3). 


e Q2. Do exercise 1(abde) of Section 2.5 in the textbook. 
e Q3. Do exercise 2(b) of Section 2.5 in the textbook. 

e (4. Do exercise 4 of Section 2.5 in the textbook. 

e Q5. Do exercise 10 of Section 2.5 in the textbook. 


e Q6. Let 8 := ((1,0), (0,1)) be the standard basis of R?, and let 6’ := 
((3, —4), (4,3)) be another basis of R?. Let 1 be the line connecting 
the origin to (4,3), and let T : R? + R? be the operation of reflection 
through / (so if v € R?, then Tv is the reflected image of v through the 
line J. 


e (a) What is [7 lee (You should do this entirely by drawing pictures). 


e (b) Use the change of variables formula to determine [T ie 


e (c) If (x,y) € R’, give a formula for T(z, y). 
e Q7. Let T: P,(R) > R"*! be the map 
TF) = (0), F(),fQ),---s £0). 
Thus, for instance if n = 3, then T(x?) = (0, 1,4, 9). 
e (a) Prove that T is linear. 
e (b) Prove that T is an isomorphism. 


e Q8*. Let A, B be n x n matrices such that AB = I,,, where I, is the 
n x n identity matrix. 


e (a) Show that L4Lpg = Ign, where Ign the identity on R”. 


e (b) Show that Lz is one-to-one and onto. (Hint: Use (a) to obtain the 
one-to-one property. Then use the Dimension theorem to deduce the 
onto property). 


253 


(c) Show that Lgl, = Ign. (Hint: First use (a) to show that 
Lplalg = Lz, and then use the fact that Lg is onto). 


(d) Show that BA = [,. 


(To summarize the result of this problem: if one wants to show that 
two n x n matrices A, B are inverses, one only needs to show AB = I,; 
the other condition BA = I[,, comes for free). 


Q9. Let A, B, C be n x n matrices. 
(a) Show that A is similar to A. 
(b) Show that if A is similar to B, then B is similar to A. 


(c) Show that if A is similar to B, and B is similar to C, then A is 
isomorphic to C. 


(Incidentally, the above three properties (a)-(c) together mean that 
similarity is an equivalence relation). 


Q10*. Let V be a finite-dimensional vector space, let T : V + V be 
a linear transformation, and let S : V — V be an invertible linear 
transformation. 


(a) Prove that R(STS~') = S(R(T)) and N(STS~') = S(N(T)). 
(Recall that R(T) := {Tv : v € V} is the range of T, while N(T) := 
{vu € V : Tv = 0} is the null space of T. Also, for any subspace W of 
V, recall that S(W) := {Suv:v € W} is the image of W under S.) 


(b) Prove that rank(R(T)) = rank(R(STS~')) and nullity(R(T)) = 
nullity(R(STS~')). (Hint: use part (a) as well as Q1). 


254 


Assignment 6 Due November 14 Covers: Sections 3.1-3.2; 4.1-4.4 
e QI. Do Question 1(abcdefgi) of Section 3.1 of the textbook. 

e Q2. Do Question 1(acdefhi) of Section 3.2 of the textbook. 

e Q3. Do Question 5(e) of Section 3.2 of the textbook. 

e Q4. Do Question 6(ade) of Section 3.2 of the textbook. 

e Q5. Do Question 1(abcdefgh) of Section 4.2 of the textbook. 
e Q6. Do Question 25 of Section 4.2 of the textbook. 


e Q7. Let U, V, W be finite-dimensional vector spaces, and let S: V > 
W and T':U — V be linear transformations. 


a) Show that rank(ST) < rank(S). 
b) Show that rank(ST) < rank(T). 


( 

( 

(c) Show that nullity(ST) > nullity(T). 

(d) Give an example where nullity(ST) > nullity(S). 
( 


e) Give an example where nullity(ST) < nullity(S). 


e Q8. Let A and B be n x n matrices. Prove (from the definition of 
transpose and matrix multiplication) that (AB)* = BtA’. 


e Q9. Let A be an invertible n x n matrix. Prove that det(A~') = 
1/det(A). 


e Q10. Let A and B be invertible n xn matrices. Show that one there is a 
sequence of elementary row operations which transforms A to B. (Hint: 
first show that there is a sequence of row operations which transforms 
A to the identity matrix). 


259 


Assignment 7 Due November 21 Covers: Sections 4.4; 5.1-5.2 


e QI. Do Question 1(acdefgijk) of Question 1 of Section 4.4 of the text- 
book. 


e Q2. Do Question 4(g) of Section 4.4 of the textbook. 
e Q3. Do Question 2(a) of Section 5.1 of the textbook. 


e Q4. Do Question 3(a) of Section 5.1 of the textbook. (Treat any 
occurrence of F' as if it were R instead). 


e Q5. Do Question 8 of Section 5.1 of the textbook. (You may assume 
that T’: V — V isa linear transformation from some finite-dimensional 
vector space V to itself; this is what it means for T' to be a linear 
transformation “on V). 


e Q6*. Do Question 11 of Section 5.1 of the textbook. 
e Q7. Do Question 15 of Section 5.1 of the textbook. 
e Q8. Do Question 3(bf) of Section 5.2 of the textbook. 


e Q9. Let A and B be similar n x n matrices. Show that A and B 
have the same set of eigenvalues (i.e. every eigenvalue of A is also an 
eigenvalue of B and vice versa). 


e Q10*. For this question, the field of scalars will be the complex numbers 
C:= {r+ yi: x,y € R} instead of the reals (i.e. all matrices, etc. are 
allowed to have complex entries). Let @ be a real number, and let A be 
the 2 x 2 rotation matrix 


cosO —sind 
ree cos 8 } 


e (a) Show that A has eigenvalues e and e~“’. (You may use Euler 
formula e’ = cos@ +isin@). What are the eigenvectors corresponding 


to e and e~°? 


256 


(b) Write A = QDQ7! for some invertible matrix Q and diagonal 
matrix D (note that Q and D may have complex entries. Also, there 
are several possible answers to this question; you only need to give one 
of them). 


(c) Let n > 1 be an integer. Prove that 


An = ( cosn@ —sinndé ) 


sinnéd cosné 


(You may find the formulae (e)" = e’? = cosné + isinné and 
(e—)" = e~™9 — cosné — isinné to be useful). 


(d) Can you explain why the operator L4 : R? + R? corresponds to 
an anti-clockwise rotation of the plane R? by angle 0? 


(ce) Based on (d), can you think of a geometrical interpretation of the 
result proven in (c)? 


201 


Assignment 8 Due December 5 Covers: Sections 5.2,6.2-6.3 


e Q1. Do Question 8 of Section 6.1 in the textbook. 
e Q2. Do Question 11 of Section 6.1 in the textbook. 
e Q3. Do Question 4 of Section 6.2 in the textbook. 


e Q4. Do Question 13(a) of Section 6.2 in the textbook. (Hint: Use 
Theorem 6 from the Week 9 notes). 


e Q5. Do Question 17(bc) of Section 6.2 of the textbook. 
e Q6. Do Question 18(b) of Section 6.2 of the textbook. 
e Q7. Do Question 2 of Section 6.3 of the textbook. 


e Q8. Let A be an n x n matrix with n distinct eigenvalues \,,..., An- 
Show that det(A) = A,A2... An and tr(A) = Ay + Ag+... + An. 


e Q9. Let V be a finite-dimensional inner product space, and let W be a 
subspace of V. Show that (W+)+ = W; i.e. the orthogonal complement 
of the orthogonal complement of W is again W. 


e Q10*. Find a 2x2 matrix A which has (1,1) and (1,0) as eigenvectors, 
is not equal to the identity matrix, and is such that A? = Jy, where [5 
is the 2 x 2 identity matrix. (Hint: you might want to use Q7 from last 
week’s homework to work out what the eigenvalues of A must be). 


258 


Mathematics 115A/3 
Terence Tao 
Final Examination, Dec 10, 2002 


259 


Problem 1. (15 points) Let W be a finite-dimensional real vector space, 
and let U and V be two subspaces of W. Let U + V be the space 


U+V:={u+tv:uEeU andveV}. 


You may use without proof the fact that U + V is a subspace of W. 


(a) (5 points) Show that dim(U + V) < dim(U) + dim(V). 


(b) (5 points) Suppose we make the additional assumption that UNV = 
{0}. Now prove that dim(U + V) = dim(U) + dim(V). 


Problem I continues on the next page. 


260 


Problem 1 continued. 

(c) (5 points) Let U and V be two three-dimensional subspaces of R°. 
Show that there exists a non-zero vector v € R° which lies in both U and V. 
(Hint: Use (b) and argue by contradiction). 


261 


Problem 2. (10 points) Let P:(R) be the space of polynomials of degree 
at most 2, with real coefficients. We give P:(R) the inner product 


(f,9) - | f(x)g(x) da. 


You may use without proof the fact that this is indeed an inner product for 
P>(R). 
(a) (5 points) Find an orthonormal basis for P2(R). 


Ans. 


(b) (5 points) Find a basis for span(1,x)~. 


Ans. 


262 


Problem 3. (15 points) Let P3(R) be the space of polynomials of degree 
at most 3, with real coefficients. Let T : P3(R) — P3(R) be the linear 
transformation 

df 


Lf = Tn 
thus for instance T(x? + 2x) = 327+ 2. You may use without proof the 
fact that T is indeed a linear transformation. Let 6 := (1,2,x?,x) be the 
standard basis for P3(R). 
(a) (5 points) Compute the matrix ule 


Ans. 


(b) (3 points) Compute the characteristic polynomial of [T le 


Ans. 


(c) (5 points) What are the eigenvalues and eigenvectors of T’? (Warn- 
ing: the eigenvectors of T’ are related to, but not quite the same as, the 
eigenvectors of [T i 


Ans. 
Problem 3 continues on the next page. 


263 


Problem 3 continued. 
(d) (2 points) Is T’ diagonalizable? Explain your reasoning. 


264 


Problem 4. (15 points) This question is concerned with the linear trans- 
formation T : R* > R? defined by 


T(x,y,2,w) = («@+yt2z,y+2z4 3u,2 — 2 -2Qw). 


You may use without proof the fact that T is a linear transformation. 
(a) (5 points) What is the nullity of T? 


Ans. 


(b) (5 points) Find a basis for the null space. (This basis does not need 
to be orthogonal or orthonormal). 


Ans. 


(c) (5 points) Find a basis for the range. (This basis does not need to be 
orthogonal or orthonormal). 


Ans. 


265 


Problem 5. (10 points) Let V be a real vector space, and let T: V > V 
be a linear transformation such that T? = T. Let R(T) be the range of T 
and let N(T) be the null space of T. 

(a) (5 points) Prove that R(T) M N(T) = {0}. 


(b) (5 points) Let R(T) + N(T) denote the space 
R(T) + N(T) :={a+y:a2€ R(T) and ye N(T)}. 


Show that R(T) + N(T) =V. (Hint: First show that for any vector v € V, 
the vector v — Tv lies in the null space N(T)). 


266 


Problem 6. (15 points) Let A be the matrix 


0 10 
A:=]| -1 0 0 
0 “1 


(a) (5 points) Find a complex invertible matrix Q and a complex diagonal 
matrix D such that A = QDQ7". (Hint: A has —1 as one of its eigenvalues). 


Ans. 


Problem 6 continues on the next page. 


267 


Problem 6 continued. 
(b) (5 points) Find three elementary matrices FE, E2, E3 such that A = 
E,E,E3. (Note: this problem is not directly related to (a)). 


Ans. 


(c) (5 points) Compute At, by any means you wish. 


Ans. 


268 


Problem 7. (10 points) Let f,g be continuous, complex-valued functions 
on [—1, 1] such that ie |f(x)|? dx = 9 and fi lata) |? d= 16. 

(a) (5 points) What possible values can iia f(x)g(2) dx take? Explain 
your reasoning. 


Ans. 


(b) (5 points) What possible values can fie | f(x)+g9(x)|? dx take? Explain 
your reasoning. 


Ans. 


269 


Problem 8. (10 points) Find a 2 x 2 matrix A with real entries which 


has trace 5, determinant 6, and has as one of its eigenvectors. (Hint: 


1 
1 
First work out what the characteristic polynomial of A must be. There are 
several possible answers to this question; you only have to supply one of 


them.) 


Ans. 


270 


271 


